Streaming Data to REST Service Using Unix Pipes

Question

Based on an answer to another question I am using curl to stream the stdout of one process as the entity of a POST request:

myDataGeneratingApp \
| curl -H "Content-Type: application/json" -H "Transfer-Encoding: chunked" -X POST -d @- http://localhost:12000

Unfortunately, curl is waiting for EOF from the stdout before it begins sending the data. I know this because I can run my application stand-alone and data comes out to the console immediately but when I pipe to curl there is a significant delay before the service begins receiving data.

How can I use curl to stream data immediately as it becomes available from the standard out of the application? If not possible in curl then is there another solution (e.g. wget)?

at what point in the stream should curl consider it enough to send? — Jeff Schaller, Sep 13 '18 at 16:50
@JeffSchaller The entire contents should be posted but in a streaming fashion. As bytes are available from stdout of myDataGeneratingApp they should be sent "across the wire" to the server. The request completes when the stdout completes... — Ramón J Romero y Vigil, Sep 13 '18 at 16:51
apart from the buffering, is your data respecting the chunked protocol by preceding each chunk with a hex length? — meuh, Sep 13 '18 at 18:56
@meuh Yes, unfortunately that only applies to the response going to the output stream not the input stream. — Ramón J Romero y Vigil, Sep 13 '18 at 20:05
the console is typically line buffered (see setbuf(3)) while network transfers are block-buffered by default. you'd have to try to disable that buffering (in both myDataGeneratingApp and probably also curl) or to write something very custom to feed the web service — thrig, Sep 13 '18 at 21:19

meuh · Accepted Answer · 2018-09-14T14:22:08.410

Looking through the curl code transfer.c it seems that the program is able to repackage request data (from curl to the server) using the chunking protocol, where each chunk of data is prefixed by the length of the chunk in ascii hexadecimal, and suffixed by \r\n.

It seems the way to make it use this in a streaming way, after connecting to the server is with -T -. Consider this example:

for i in $(seq 5)
do date
   sleep 1
done | 
dd conv=block cbs=512 |
strace -t -e sendto,read -o /tmp/e \
 curl --trace-ascii - \
 -H "Transfer-Encoding: chunked" \
 -H "Content-Type: application/json" \
 -X POST -T -  http://localhost/...

This script sends 5 blocks of data, each beginning with the date and padded to 512 bytes by dd, to a pipe, where strace runs curl -T - to read the pipe. In the terminal we can see

== Info: Connected to localhost (::1) port 80 (#0)
=> Send header, 169 bytes (0xa9)
0000: POST /... HTTP/1.1
001e: Host: localhost
002f: User-Agent: curl/7.47.1
0048: Accept: */*
0055: Transfer-Encoding: chunked
0071: Content-Type: application/json
0091: Expect: 100-continue
00a7: 
<= Recv header, 23 bytes (0x17)
0000: HTTP/1.1 100 Continue

which shows the connection, and the headers sent. In particular curl has not provided a Content-length: header, but an Expect: header to which the server (apache) has replied Continue. Immediately after comes the first 512 bytes (200 in hex) of data:

=> Send data, 519 bytes (0x207)
0000: 200
0005: Fri Sep 14 15:58:15 CEST 2018                                   
0045:                                                                 
0085:                                                                 
00c5:                                                                 
0105:                                                                 
0145:                                                                 
0185:                                                                 
01c5:                                                                 
=> Send data, 519 bytes (0x207)

Looking in the strace output file we see each timestamped read from the pipe, and sendto write to the connection:

16:00:00 read(0, "Fri Sep 14 16:00:00 CEST 2018   "..., 16372) = 512
16:00:00 sendto(3, "200\r\nFri Sep 14 16:00:00 CEST 20"..., 519, ...) = 519
16:00:00 read(0, "Fri Sep 14 16:00:01 CEST 2018   "..., 16372) = 512
16:00:01 sendto(3, "200\r\nFri Sep 14 16:00:01 CEST 20"..., 519, ...) = 519
16:00:01 read(0, "Fri Sep 14 16:00:02 CEST 2018   "..., 16372) = 512
16:00:02 sendto(3, "200\r\nFri Sep 14 16:00:02 CEST 20"..., 519, ...) = 519
16:00:02 read(0, "Fri Sep 14 16:00:03 CEST 2018   "..., 16372) = 512
16:00:03 sendto(3, "200\r\nFri Sep 14 16:00:03 CEST 20"..., 519, ...) = 519
16:00:03 read(0, "Fri Sep 14 16:00:04 CEST 2018   "..., 16372) = 512
16:00:04 sendto(3, "200\r\nFri Sep 14 16:00:04 CEST 20"..., 519, ...) = 519
16:00:04 read(0, "", 16372)             = 0
16:00:05 sendto(3, "0\r\n\r\n", 5, ...) = 5

As you can see they are spaced out by 1 second, showing that the data is being sent as it is being received. You must just have at least 512 bytes to send, as the data is being read by fread().

score 1 · Answer 2 · edited Oct 07 '21 at 07:34

See Edit below

What you want is not possible. To send the POST data, the length must be known, so curl must first read your whole data to determine the length.

Transfer-Encoding: chunked is a way around that restriction, but just for the response from the server.

The reason is that chunked is only supported in HTTP/1.1, but when sending the request, the client can't know whether the server understand HTTP/1.1 or not. That information comes with the answer, but that it too late for sending the request.

Edit

This seems to be a limitation in wget, from the wget manual:

Please be aware that Wget needs to know the size of the POST data in advance. Therefore the argument to --post-file must be a regular file; specifying a FIFO or something like /dev/stdin won’t work. It’s not quite clear how to work around this limitation inherent in HTTP/1.0. Although HTTP/1.1 introduces chunked transfer that doesn’t require knowing the request length in advance, a client can’t use chunked unless it knows it’s talking to an HTTP/1.1 server. And it can’t know that until it receives a response, which in turn requires the request to have been completed – a chicken-and-egg problem.

While the problem exists, it is recognized in RFC 7230:

A client MUST NOT send a request containing Transfer-Encoding unless it knows the server will handle HTTP/1.1 (or later) requests; such knowledge might be in the form of specific user configuration or by remembering the version of a prior received response.

So sending chunked POST data is possible, and as the other answer shows, curl already supports it.

Please add some reference (eg. rfc) for you claim that Transfer-Encoding: chunked is only to be used in responses. As to your last sentence, a client is already assuming that the server supports HTTP/1.1 when it sends the request (GET /foo HTTP/1.1) and it should be prepared to receive a error if that's not the case. — , Sep 14 '18 at 06:23
So would the --http1.1 flag, from version 7.33.0, solve the problem? — Ramón J Romero y Vigil, Sep 14 '18 at 10:17

Streaming Data to REST Service Using Unix Pipes

2 Answers2