110

I have a binary (that I can't modify) and I can do:

./binary < file

I also can do:

./binary << EOF
> "line 1 of file"
> "line 2 of file"
...
> "last line of file"
> EOF

But

cat file | ./binary

gives me an error. I don't know why it doesn't work with a pipe. In all 3 cases the content of file is given to the standard input of binary (in different ways):

  1. bash reads the file and gives it to stdin of binary
  2. bash reads lines from stdin (until EOF) and gives it to stdin of binary
  3. cat reads and puts the lines of file to stdout, bash redirects them to stdin of binary

The binary shouldn't notice the difference between those 3 as far as I understood it. Can someone explain why the 3rd case doesn't work?

BTW: The error given by the binary is:

20170116/125624.689 - U3000011 Could not read script file '', error code '14'.

But my main question is, how is there a difference for any program with that 3 options.

Here are some further details: I tried it again with strace and there were in fact some errors ESPIPE (Illegal seek) from lseek followed by EFAULT (Bad address) from read right before the error message.

The binary I tried to control with a ruby script (without using temporary files) is part of the callapi from Automic (UC4).

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
Boris
  • 1,073
  • 2
    what's the error that you get with the cat version? – Jeff Schaller Jan 16 '17 at 11:52
  • 20170116/125624.689 - U3000011 Could not read script file '', error code '14'. – Boris Jan 16 '17 at 11:56
  • 1
    that should be edited into the question so that it's not lost – Jeff Schaller Jan 16 '17 at 11:56
  • 1
    To some extend, there is difference "between those 3", but those are deep details we don't need to dive into. Look into documentation for your binary. Does it mention anything about reading from pipes ? Also, it would help if you at least mentioned what is the software/binary you're using. – Sergiy Kolodyazhnyy Jan 16 '17 at 12:00
  • 30
    Cool, there is an UUOC detector embedded in your binary. I want it. – xhienne Jan 16 '17 at 12:15
  • 5
    What OS is it (so we can tell what 14 is if it's meant to be an errno)? – Stéphane Chazelas Jan 16 '17 at 12:22
  • 2
    @xhienne Here, take my seek(). – Jens Jan 16 '17 at 14:43
  • 6
    Even though it is possible for a program to react this way, it would be a stangely buggy one that did. Every non-crazy program that expects any input from stdin at all needs to work when stdin is a tty, and if it can work with both a tty and a file, there is little reason not to support pipes too. Probably the author of the program had a temporary hemorrhage and though that anything that isatty() returns false for will be a seekable or mmappable file ... – hmakholm left over Monica Jan 16 '17 at 14:55
  • Randal Schwartz used to give out "Useless use of cat" Awards :) http://porkmail.org/era/unix/award.html – brian d foy Jan 17 '17 at 14:58
  • 9
    Error code 14 stands for EFAULT. On a read that occurs if the buffer you have declared is invalid. I would strace the program but I suspect it is seeking to the end of file to get a buffer size for reading the data, badly handling the fact that seek doesn't work and attempting to allocate a negative size (not handling a bad malloc). Passing the buffer to read which faults given the buffer is not valid. – Matthew Ife Jan 17 '17 at 15:07
  • 4
    @xhienne No, it has a cat preventer in it. It appears that you couldn't use it to combine two files, as is the intended usage. – jpmc26 Jan 17 '17 at 19:37
  • @HenningMakholm I have seen different behavior piping versus reading from stdin when using "strings" on a binary file. – Michael Jan 17 '17 at 21:22

4 Answers4

158

In

./binary < file

binary's stdin is the file open in read-only mode. Note that bash doesn't read the file at all, it just opens it for reading on the file descriptor 0 (stdin) of the process it executes binary in.

In:

./binary << EOF
test
EOF

Depending on the shell, binary's stdin will be either a deleted temporary file (AT&T ksh, zsh, bash...) that contains test\n as put there by the shell or the reading end of a pipe (dash, yash; and the shell writes test\n in parallel at the other end of the pipe). In your case, if you're using bash, it would be a temp file.

In:

cat file | ./binary

Depending on the shell, binary's stdin will be either the reading end of a pipe, or one end of a socket pair where the writing direction has been shut down (ksh93) and cat is writing the content of file at the other end.

When stdin is a regular file (temporary or not), it is seekable. binary may go to the beginning or end, rewind, etc. It can also mmap it, do some ioctl()s like FIEMAP/FIBMAP (if using <> instead of <, it could truncate/punch holes in it, etc).

pipes and socket pairs on the other hand are an inter-process communication means, there's not much binary can do beside reading the data (though there are also some operations like some pipe-specific ioctl()s that it could do on them and not on regular files).

Most of the times, it's the missing ability to seek that causes applications to fail/complain when working with pipes, but it could be any of the other system calls that are valid on regular files but not on different types of files (like mmap(), ftruncate(), fallocate()). On Linux, there's also a big difference in behaviour when you open /dev/stdin while the fd 0 is on a pipe or on a regular file.

There are many commands out there that can only deal with seekable files, but when that's the case, that's generally not for the files open on their stdin.

$ unzip -l file.zip
Archive:  file.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
       11  2016-12-21 14:43   file
---------                     -------
       11                     1 file
$ unzip -l <(cat file.zip)
     # more or less the same as cat file.zip | unzip -l /dev/stdin
Archive:  /proc/self/fd/11
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of /proc/self/fd/11 or
        /proc/self/fd/11.zip, and cannot find /proc/self/fd/11.ZIP, period.

unzip needs to read the index stored at the end of the file, and then seek within the file to read the archive members. But here, the file (regular in the first case, pipe in the second) is given as a path argument to unzip, and unzip opens it itself (typically on fd other than 0) instead of inheriting a fd already opened by the caller. It doesn't read zip files from its stdin. stdin is mostly used for user interaction.

If you run that binary of yours without redirection at the prompt of an interactive shell running in a terminal emulator, then binary's stdin will be inherited from its caller the shell, which itself will have inherited it from its caller the terminal emulator and will be a pty device open in read+write mode (something like /dev/pts/n).

Those devices are not seekable either. So, if binary works OK when taking input from the terminal, possibly the issue is not about seeking.

If that 14 is meant to be an errno (an error code set by failing system calls), then on most systems, that would be EFAULT (Bad address). The read() system call would fail with that error if asked to read into a memory address that is not writable. That would be independent of whether the fd to read the data from points to a pipe or regular file and would generally indicate a bug1.

binary possibly determines the type of file open on its stdin (with fstat()) and runs into a bug when it's neither a regular file nor a tty device.

Hard to tell without knowing more about the application. Running it under strace (or truss/tusc equivalent on your system) could help us see what is the system call if any that is failing here.


1 The scenario envisaged by Matthew Ife in a comment to your question sounds a lot plausible here. Quoting him:

I suspect it is seeking to the end of file to get a buffer size for reading the data, badly handling the fact that seek doesn't work and attempting to allocate a negative size (not handling a bad malloc). Passing the buffer to read which faults given the buffer is not valid.

  • 15
    Very interesting... this is the first I've heard that redirected standard input in the style of ./binary < file is seekable! – David Z Jan 16 '17 at 19:45
  • 3
    @DavidZ it's a file that's been opened and it behaves the same as any file that's been opened. It just happens to have been inherited from a parent process, but that's not so uncommon. – hobbs Jan 17 '17 at 00:32
  • 3
    If the system contains strace or a similar tool it could be used to check on which system call the binary fails. – pabouk - Ukraine stay strong Jan 17 '17 at 08:37
  • @pabouk, good point. Added to the answer. – Stéphane Chazelas Jan 17 '17 at 09:45
  • 2
    "It can also truncate it, mmap it, punch holes in it etc." - Well, no. The file is open in read-only mode. The program would have to open it in write mode to do that. But it can't open it in write mode, because there's no interface for doing that directly, nor is there any interface for finding "the" directory entry that corresponds to an open file (what if there's two such dentries, or zero?). It would have to stat the file and then scan the filesystem for an object with the same inode number. That would be inordinately slow. – Kevin Jan 18 '17 at 03:48
  • @Kevin, d'oh. Silly me. Fixed now. – Stéphane Chazelas Jan 18 '17 at 08:16
  • @Kevin: Linux's dup3() system call looks like it might work, but O_RDWR isn't supported as a valid flag. I tried it; with stdin redirected from a file I own with 0664 permissions, strace says dup3(0, 5, 0x2 /* O_??? */) = -1 EINVAL (Invalid argument). Interesting. I supposed there might be security reasons for not providing an API for opening a new fd onto the inode from an existing fd, since passing already-open read-only fds to less-privileged contexts is a thing. Similar problems to re-linking a deleted inode. – Peter Cordes Jan 18 '17 at 22:53
  • @PeterCordes, inodes and fds are concepts that are orthogonal. In any case, on Linux, there's /proc/self/fd – Stéphane Chazelas Jan 18 '17 at 22:59
  • 1
    @StéphaneChazelas: oh right, open("/proc/self/fd/0", O_RDWR) works, even on deleted files. Silly me :P. echo foo>foo; (sleep 0.5; ll -L /proc/self/fd/0; strace ./a.out; ll -L /proc/self/fd/0) < foo & sleep 0.1 && rm foo unlinks foo before a.out runs with its stdin redirected from foo. – Peter Cordes Jan 18 '17 at 23:37
  • Every open fd is associated with an inode, right? Even pipes and sockets have some kind of kernel internal inode number which you can see with fstat. Opening a new fd with a different mode requires a permissions check on the inode, so any API for doing so must look at the inode referenced by the original fd. – Peter Cordes Jan 18 '17 at 23:41
  • 1
    @PeterCordes, yes, you're right. Note that the dup() family returns a fd pointing to the same open file description, it's not a new independent fd associated to the same file. Changing the O_RDONLY/O_RDWR/O_APPEND would not make sense. The flags that dup3() changes are those that pertain to the fd (like O_CLOEXEC), not the ones on the open file description. Note that on systems other than Linux, opening /dev/fd/x works like dup(). Linux is the exception here. – Stéphane Chazelas Jan 19 '17 at 09:33
  • Oh right, I'd forgotten that the duplicated fd is not independent. If there was a real need for it, dup3 could in theory optionally create an independent fd depending on flags, maybe with an O_REOPEN flag. But it's a fundamentally different operation which probably wouldn't share much implementation code with dup3, so a new syscall would be cleaner. Good thing open("/proc/self/fd/0") works the way it does on Linux, since that's a clean way to implement it with only a bit more overhead (trivial string manipulation and then the kernel parsing a pathname and VFS lookups). – Peter Cordes Jan 19 '17 at 10:06
47

Here's a simple example program that illustrates Stéphane Chazelas' answer using lseek(2) on its input:

#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>

int main(void)
{
    int c;
    off_t off;
    off = lseek(0, 10, SEEK_SET);
    if (off == -1)
    {
        perror("Error");
        return -1;
    }
    c = getchar();
    printf("%c\n", c);
}

Testing:

$ make seek
cc     seek.c   -o seek
$ cat foo
abcdefghijklmnopqrstuwxyz
$ ./seek < foo
k
$ ./seek <<EOF
> abcdefghijklmnopqrstuvwxyz
> EOF
k
$ cat foo | ./seek
Error: Illegal seek

Pipes are not seekable, and that's one place where a program might complain about pipes.

muru
  • 72,889
21

The pipe and redirection are different animals, so to speak. When you use here-doc redirection ( << ) or redirecting stdin < the text doesn't come in out of thin air - it actually goes into a file descriptor ( or temporary file, if you will ), and that is where the binary's stdin will be pointing.

Specifically, here's an excerpt from bash's source code, redir.c file (version 4.3):

/* Create a temporary file holding the text of the here document pointed to
   by REDIRECTEE, and return a file descriptor open for reading to the temp
   file.  Return -1 on any error, and make sure errno is set appropriately. */
static int
here_document_to_fd (redirectee, ri)

So since redirection can basically be treated as files, the binaries can navigate them , or seek() through the file easily, jumping to any byte of the file.

Pipes , since they are buffers of 64 KiB (at least on Linux) with writes of 4096 bytes or less guaranteed to be atomic, aren't seekable, i.e. you cannot freely navigate them - only read sequentially. I once implemented tail command in python. 29 million lines of text can be seeked in microseconds if redirected, but if cat'ed via pipe , well, there's nothing that can be done - so it all has to be read sequentially.

Another possibility is that the binary might want to open a file specifically, and doesn't want to receive input from a pipe. It's usually done via fstat() system call, and checking if the input comes from a S_ISFIFO type of file (which signifies a pipe/named pipe).

Your specific binary, since we don't know what it is, probably attempts seeking, but cannot seek pipes. It is recommended you consult its documentation to find out what exactly error code 14 means.

NOTE: Some shells, such as dash ( Debian Almquist Shell, default /bin/sh on Ubuntu ) implement here-doc redirection with pipes internally, thus may not be seekable. The point remains the same - pipes are sequential and cannot be navigated easily, and attempts to do so will result into errors.

  • Stephane's answer says that here-docs can be implemented with pipes, and that some common shells like dash do so. This answer explains the observed behaviour with bash, but that behaviour apparently isn't guaranteed across other shells. – Peter Cordes Jan 18 '17 at 22:55
  • @PeterCordes that is absolutely so , and I just verified it with dash on my system. I wasn't aware of that previously. Thanks for pointing out – Sergiy Kolodyazhnyy Jan 18 '17 at 23:12
  • Another comment: you'd use fstat() on stdin to check if it's a pipe. stat takes a pathname. But really, just attempting to lseek is the probably the most sane way to determine if an fd is seekable after it's already open. – Peter Cordes Jan 18 '17 at 23:26
6

The main difference is in the error handling.

In the following case the error is reported

$ /bin/cat < z.txt
-bash: z.txt: No such file or directory
$ echo $?
1

In the following case the error is not reported.

$ cat z.txt | /bin/cat
cat: z.txt: No such file or directory
$ echo $?
0

With bash, you can still use PIPESTATUS :

$ cat z.txt | /bin/cat
cat: z.txt: No such file or directory
$ echo ${PIPESTATUS[0]}
1

But it is available only immediately after the execution of the command :

$ cat z.txt | /bin/cat
cat: z.txt: No such file or directory
$ echo $?
0
$ echo ${PIPESTATUS[0]}
0
# oops !

There is another difference, when we use shell functions instead of binaries. In bash, functions that are part of a pipeline are executed in sub-shells (except for the last pipeline component if the lastpipe option is enabled and bash is non-interactive), so the change of variables have no effects in the parent shell:

$ a=a
$ b=b
$ x(){ a=x;}
$ y(){ b=y;}

$ echo $a $b
a b

$ x | y
$ echo $a $b
a b

$ cat t.txt | y
$ echo $a $b
a b

$ x | cat
$ echo $a $b
a b

$ x < t.txt
$ y < t.txt
$ echo $a $b
x y
Vouze
  • 839
  • 5
    So, you're showing that error handling with > is done by the shell, but with pipe it's done by command that produces text. OK. But in this specific question, OP is using an existing file, so that's not the issue, and clearly error produced is by the binary. – Sergiy Kolodyazhnyy Jan 17 '17 at 01:21
  • 1
    While it is mostly beside the point, this answer does have some relevance to this Q&A in the general case and is mostly correct, so I don't think it deserves those downvotes. – Stéphane Chazelas Jan 17 '17 at 17:22
  • @Serg : When you use shell as a command line, this is not important. But in scripts, the handling of errors can be very important. – Vouze Jan 25 '17 at 13:36