I've been reading about CONFIG_WBT, and also BFQ. I tried to compare WBT v.s. CFQ in my hard drive. I learned CFQ tries to control Linux' massive async writeback, but its success is limited because of the hard drive's writeback cache. Disabling the hardware write cache (but leaving NCQ enabled) on my drive allowed much improved control.[1]
[1] Determine the specific benefit of Writeback Throttling (CONFIG_WBT)
I know WBT is nowadays disabled on CFQ/BFQ. Also, since upstream Linux v4.19 is pushing blk-mq as a default for scsi, distributions e.g. Fedora need to switch from CFQ to BFQ by default, or switch back to the "legacy" block layer, or etc., according to their evaluations. So I would like to understand BFQ.
I read BFQ has two hardware-side heuristics. It "overcharges" writes by 10x, to mitigate the effect of device write cache. It also tries to mitigate the effect of NCQ using idling. For now, I am most confused by the write overcharge.
To keep low the ratio between the number of write requests and the number of read requests served, we just added a write (over)charge coefficient: for each sector written, the budget of the active application is decremented by this coefficient instead of one. As shown by our experimental results, a coefficient equal to ten proved effective in guaranteeing high throughput and low latency.
http://algo.ing.unimo.it/people/paolo/disk_sched/bf1-v1-suite-results.pdf
/*
 * Async to sync throughput distribution is controlled as follows:
 * when an async request is served, the entity is charged the number
 * of sectors of the request, multiplied by the factor below
 */
static const int bfq_async_charge_factor = 10;
https://elixir.bootlin.com/linux/v4.18/source/block/bfq-iosched.c#L190
(I don't see any code in BFQ to disable this factor when writeback caching is disabled. I see WBT included some code to track if writeback caching is enabled, for very similar reasons. In principle I assume BFQ could do the same thing, but right now it seems BFQ will always overcharge writes, even though BFQ only claims it is needed on devices with writeback caching).
This says async writes will be given a much lower share of the device throughput. Is there a simple test case to observe this "unfair" share? Or am I mis-understanding?
My link above included a quick test of BFQ.  This was a simultaneous read v.s. write with basically default fio settings.  I think BFQ gave the reader and writer something much closer to a "fair" share.  (The reader achieved 40MB/s on my hard drive).
 
    
--ioengine=libaio --direct=1 --iodepth=<sensible number here>etc. so you know you aren't fighting cache effects too. The other thing to check would be whatfioreports as the latency of your different jobs... – Anon Feb 04 '19 at 05:33