These features have existed for quite a long time.
Solaris 2.6 in 1997 added a kernel asynchronous IO system call that does exactly this - kaio()
.
One way it can be accessed is via the `lio_listio() function:
lio_listio
Synopsis
cc [ flag... ] file... -lrt [ library... ]
#include <aio.h>
int lio_listio(int mode, struct aiocb restrict const list[], int nent,
struct sigevent restrict sig);
Description
The lio_listio()
function allows the calling process, LWP, or
thread, to initiate a list of I/O requests within a single function
call.
The Illumos libc
source code that's been open-sourced and descended from that original Solaris implementation of lio_listio()
can be found at https://github.com/illumos/illumos-gate/blob/470204d3561e07978b63600336e8d47cc75387fa/usr/src/lib/libc/port/aio/posix_aio.c#L121
One reason features like this aren't more common is they really don't improve performance much unless the entire software and hardware system is designed to take advantage of it.
Storage has to be configured to provide properly aligned blocks, file systems have to be built so they're properly aligned to the blocks the storage system provides, and the entire software stack needs to be written to not screw the IO up - it all has to do properly-aligned IO.
And with spinning disks, it's easy for a batch of IO operations to the same disk(s) to interfere with each other and actually slow everything down as the head(s) spend more time seeking.
And in my experience all it takes is one of the layers to do things wrong for the performance advantage of batched system calls to disappear into the overhead. Because IO is slow compared to even the worst system call overhead.
The cost of creating and maintaining a combined hardware/software system to take advantage of the performance improvement batched IO system calls offers is immense.
And the best numbers I've ever seen are that batching many IO calls into one system call can improve performance about 25-30%.
If you're processing hundreds of GB of data continuously around the clock, that matters.
Building and maintaining an entire system like that just to lower the latency of viewing cat videos from 8 ms to 6 ms? Not so much.
readv
/writev
family of syscalls comes to mind as something along those lines (and those are pretty old). – Sebastian Riese Feb 12 '23 at 16:46