I need to download and decompress a file as quickly as possible in a very latency sensitive environment with limited resources (A VM with 1 cpu, 2 cores, 128MB RAM)
Naturally, I tried to pipe the download process to the uncompress process to with the assumption that I could uncompress while downloading simultaneously. I know that piping is blocked by the slowest process. To overcome this I use a buffer in between the download and decompress process.
My shell script looks something like this:
curl -s $CACHE_URL | buffer -S 100M | lz4 -d > /tmp/myfile
If I first download the compressed file and then uncompress without piping the download takes about 250ms and the uncompress takes about 250ms if executed sequentially.
My assumption is therefore that the piped approach will take around 250-275ms since there is no additional disk read in between and the download isn't CPU bound like the decompression so should not affect that much.
But it isn't. It's barely faster as shown by my logs:
Start download
35211K, 81131K/s
Download & decompressed done in 447ms
Starting individual download & decompress
Download done in 234ms
Decompressed : 61 MiB
/tmp/myfile : decoded 75691880 bytes
Decompress done in 230ms
I'm I thinking wrong here? Is there any other way to speed this up?
buffer
, assuming that's the one found on Debian, only buffers up to 1MiB by default and can be told to buffer up to 2048 blocks (of 10KiB by default). Trybuffer -s 16K -m 32M
to buffer up to 32MiB, orpv -qB100M
to buffer up to 100MiB (also avoiding the reblocking thatbuffer
does) – Stéphane Chazelas Apr 23 '23 at 07:04top
and press1
should show you all CPU usages separately. The download is bandwidth limited, the decompress is CPU limited, and also I/O limited but mitigated by cache. – Paul_Pedant Apr 23 '23 at 17:05write()
will block yes, but a blocked writer won't affect the reader. The reader (the decompressor) would never block waiting on data because the writer (the downloader) outpaces it, there would always be data in the pipe available for the reader to consume. Having the downloader fetch and store the data to some secondary in-memory buffer won't make the reader finish any faster (and won't make the process, as a whole, finish any faster). – Andy Dalton Apr 28 '23 at 18:20taskset(1)
to see how the involved processes are distributed, and perhaps change their cpu affinity on-the-fly. I guess assigningcurl
andbuffer
's reader process to one cpu whilebuffer
's writer process andlz4
to the other cpu may be a worthy attempt. Besides,buffer
may not be the right tool for this job, unless perhaps you make it reblock friendlily forlz4
. Also, raisingcurl
process'snice
value might help a bit further. – LL3 Apr 28 '23 at 20:39-S 100M
does not set the buffer size, it instructsbuffer
to print out progress info every 100MB – LL3 Apr 30 '23 at 09:10