Edge case - detecting input on STDIN in perl

Question

I don't know quite how to ask this question and I'm not even sure this is the place to ask it. It seems rather complex and I don't have a full understanding of what is going on. Frankly, that's why I'm posting - to get some help wrapping my head around this. My end goal is to learn, not to solve my overall problem. I want to understand when I can expect to encounter the situation I'm about to describe and why it happens.

I have a perl module which I've been developing. One of the things it does is it detects whether there is input on standard in (whether that's via a pipe or via a redirect (i.e. <)).

To catch redirects, I employ a few different checks for various cases. One of them is looking for 0r file descriptors in lsof output. It works fairly well and I use my module in a lot of scripts without issue, but I have 1 use-case where my script thinks it's getting input on STDIN when it is not - and it has to do with what I'm getting in the lsof output. Here are the conditions I have narrowed down the case to, but these are not all the requirements - I'm missing something. Regardless, these conditions seem to be required, but take my intuition with a hefty grain of salt, because I really don't know how to make it happen in a toy example - I have tried - which is why I know I'm missing something:

When I run a perl script from within a perl script via backticks, (the inner script is the one that thinks it has been intentionally fed input on STDIN when it has not - though I should point out that I don't know whether it's the parent or child that actually opened that handle)
An input file is supplied to the inner script call that resides in a subdirectory

The file with the 0r file descriptor that lsof is reporting is:

/Library/Perl/5.18/AppendToPath

This file does not show up in the lsof output under other conditions. And if I do eof(STDIN) before and after the lsof call, the result is 1 each time. -t STDIN is undefined. fileno(STDIN) is 0.

I read about this file here and if I cat it, it has:

>cat /Library/Perl/5.18/AppendToPath
/System/Library/Perl/Extras/5.18

It appears this is a macOS-perl-specific file meant to append to the @INC perl path, but I don't know if other OS's provide analogous mechanisms.

I'd like to know more about when that file is present/opened and when it's closed. Can I close it? It seems like the file content has already been read in by the interpreter maybe - so why is it hanging around in my script as an open file handle? Why is it on STDIN? What happens in this case when I actually redirect a file in myself? Is the child process somehow inheriting it from the parent under some circumstance I'm unaware of?

UPDATE: I figured out a third (possibly final) requirement needed to make that AppendToPath file handle be open on STDIN during script execution of the child script. It turns out I had a line of code at the top of the parent script (probably added to try and solve a similar problem when I knew even less than I know now about detecting input on STDIN) that was closing STDIN. I commented out that close and everything started working without any need to exclude that weird file (i.e. that file: /Library/Perl/5.18/AppendToPath no longer shows as open on STDIN in lsof). This was the code I commented out:

close(STDIN) if(defined(fileno(STDIN)) && fileno(STDIN) ne '' &&
                fileno(STDIN) > -1);

It had a comment above it that read:

#Prevent the passing of active standard in handles to the calls to the script
#being tested by closing STDIN.

So I was probably learning about standard input detection at the time I wrote that years ago. My module probably ended up using -t STDIN and -f STDIN, etc, but I'd switched those out to work around a problem like this one using lsof so I could see better what was going on. So with the current module (using either lsof or my new(/reverted?) streamlined version using -t/-f/-p works just fine (as intended) when I don't close STDIN in the parent.

However, I would still like to understand why that file is on STDIN in a child process when the parent closes STDIN...

Stéphane Chazelas · Answer 1 · 2021-08-12T04:56:20.977

2

If your script is invoked from an interactive shell by some user without redirection, as in:

your-script with args

your script will inherit the stdin of the shell, that will be a tty device most likely open in read + write mode.

If the user invokes it as:

your-script with args < some-file

fd 0 will be open in read-only mode on some-file (of any type; if they do < /dev/pts/0, that will be a tty device as well; if it's a fifo file, stdin will appear as being from a pipe; with < /dev/null, that will be that other character device, etc.).

With:

your-script with args <> some-file

That will be the same as above except the file will be open in read+write mode, and if they do <> /dev/pts/0, that will be exactly the same as when the script is invoked non-redirected from an interactive shell in a terminal.

With:

your-script <&-

stdin will be closed.

With:

other-cmd | your-script

stdin will be a pipe in most shells (same as when doing < named-pipe or < <(cmd)), though could be a socket pair instead in ksh93.

In you-script & from a non-interactive shell, stdin will be /dev/null.

In output=$(your-script) or output=`your-scrip`, or cmd <(your-script), stdin will be left untouched but stdout will be a pipe.

In your-script |& (ksh) or coproc your-script (zsh, bash), both stdin and stdout will be a pipe.

If you script is started from:

ssh host your-script

that is, by sshd on host, then both stdin and stdout will be a pipe as well (with rsh, that would be the network socket directly in read+write).

If started by a cron or at job, stdin will likely be /dev/null, stdout a pipe (output if any will eventually be sent in an email).

etc.

To detect all these from within your script, there's no need for lsof.

To detect:

whether stdin is open: do a fcntl(STDIN, F_GETFL, 0) which would fail if stdin is not open.
in which mode it is open (r, w, rw): check for O_RDONLY, O_WRONLY, O_RDWR in the return value of fcntl() above.
the type of the file open on stdin (regular, pipe, device): do a fstat() system call (stat STDIN in perl) and get the type from the mode field in there. Or you can use perl's -f/-d/-p... to test each possible type of file.
for device files, whether it's a tty device, use the POSIX::isatty(STDIN) or -t.

But those will have little to do with answering the question: is there anything to read from stdin or would a read() fail block or return EOF, for which you'd need things like poll().

I'm not sure what your end goal is here, but it sounds like you want your script to have an interactive mode (where the user interacts with it) and a mode for automation and to switch between the two depending on what stdin and/or stdout are.

So that should just be about checking -t STDIN and maybe also -t STDOUT to check whether stdin and/or stdout is a tty (whether there's a user there interacting via the tty device).

edited Aug 12 '21 at 04:56

answered Aug 11 '21 at 06:28

Stéphane Chazelas

544,893

Lots to absorb here. I will study your answer. Thanks so much! – hepcat72 Aug 11 '21 at 06:34

First question. How do I call fcntl from within perl? When I try, I get this:

>perl -e 'print `fcntl(STDIN, F_GETFL, 0)`'
sh: -c: line 0: syntax error near unexpected token `STDIN,'
sh: -c: line 0: `fcntl(STDIN, F_GETFL, 0)'

– hepcat72 Aug 11 '21 at 14:44

OK. I was able to swap out my usage of lsof using your suggestions. I have 2 methods: amIPipedTo and amIRedirectedTo that were using lsof output. Their bodies are now entirely return(!-t STDIN && -p STDIN) & return(!-t STDIN && -f STDIN) respectively. I was already using -t elsewhere, but this was a great streamline - glad to not use lsof there. However, I still have the same edge case issue. Ie. amIRedirectedTo returns true because of that file I'm not intentionally opening: /Library/Perl/5.18/AppendToPath. It is there on STDIN. lsof at least showed me what was causing it. – hepcat72 Aug 11 '21 at 15:41
As to your pondering what my purpose is... my module (among other things) allows a user to provide files using an option (positional or with a flag) on the command line or alternatively provide the file content via pipe or redirect, in which case the argument is used as a file name stub instead of as the file to open. The stub (or input file name) is used in conjunction with a suffix option to compose output file names. I have currently 890 passing tests where the redirect/pipe is correctly identified - and this one test that fails due to that unexpected file on STDIN. – hepcat72 Aug 11 '21 at 15:49
@hepcat72, you mean something like cmd foo reads foo and creates foo.out and cmd foo < file reads stdin and creates foo.out? IMO, that's not a good design. The user should be able to decide when they want the input to come from stdin (whether that's a tty, pipe, /dev/null or regular file) or not. Why not cmd --stub=foo < file vs cmd foo or cmd --input=foo (where stub defaults to the name of the input file if note specified)? – Stéphane Chazelas Aug 11 '21 at 16:05
See perldoc -f fcntl for how to use perl's fcntl, though your issue above look like a shell one. – Stéphane Chazelas Aug 11 '21 at 16:06
Yes, though like I said, I just want to understand what's going on. I have pondered whether to make --stub a separate option, but even if I do that, I still have this problem of detecting intentional input on STDIN. I could infer the absence of --stub to mean to ignore STDIN, but I also have a default stub if one is not provided and STDIN has input. I would like to be able to make the decision based on actual input detection rather than the presence of a flag. – hepcat72 Aug 11 '21 at 16:11
Unless, stdin is closed (which should never happen), there will be something on input, whether that's a pipe, regular file, tty... It's rather the abscense of a file argument or of a --input=file option that should be the cue to read from stdin. That's what most text utilities do for instance. cat file reads a file cat reads stdin (whether it's a tty or regular file or pipe). – Stéphane Chazelas Aug 11 '21 at 16:14
Some file options are optional, which is a use-case I want to support. I already use -t to ignore tty cases. So really it boils down to me understanding why this file is on STDIN and when to expect it or anything like it. – hepcat72 Aug 11 '21 at 16:17
And I use --flag - to be intentional about reading STDIN. But one option is allowed to read STDIN by default. – hepcat72 Aug 11 '21 at 16:18

score 1 · Accepted Answer · edited Aug 17 '21 at 20:10

1

When I run a perl script from within a perl script via backticks, (the inner script is the one falsely thinking there is input on STDIN)

The inner script RIGHTLY thinks there's input on STDIN, it's just that another file open got file descriptor 0 (which, to perl, is always gievn the file handle STDIN). As you know, programs run via qx{...} or `...` in perl inherit the stdin file descriptor from the outer script, just like any other subprocess.

Because the inner script inherits the raw file descriptor 0, not the perl STDIN file handle, this creates problems with buffering, as either the inner or the outer script may end up reading more input that it needs, up to leaving nothing for the other. Consider the example:

$ echo text | perl -e '$junk=`perl -e "eof(STDIN)"`; print while <>'
$ # nothing!

Just by "testing for EOF", the inner script will leave no input for the outer script.

Doing an unbuffered read with sysread in the inner script will however work as expected:

$ cat inner.pl
sysread STDIN, $d, 2
$ echo text | perl -e '$junk = `perl inner.pl`; print while <>'
xt

[from the other answer]
With: your-script <&- stdin will be closed.

Closing file descriptors like stdin is never a good idea (daemons redirect them from /dev/null, they never close them), but is especially bad when running a script written in a language like perl or python, because that may cause stdin to end up open (and referring to the script) instead of closed:

$ cat script.pl
seek STDIN, 0, 0;
print while <STDIN>;
$ perl script.pl <&-
seek STDIN, 0, 0;
print while <STDIN>;

That happens because system calls like open(2) or socket(2) return the first free file descriptor; if stdin is closed, the returned fd will "become" the stdin.

edited Aug 17 '21 at 20:10

hepcat72

195

answered Aug 11 '21 at 14:10

user486445

41

Hi. Yes, I actually know that it inherits the parent's STDIN. My code has cases that account for that. In fact, I have almost a thousand calls in one test suite that calls scripts using my module in backticks and none of them tell me there's input on stdin unless there's a pipe or redirect in the backticks. So the context by which I use "falsely" is that of "what I coded it to do". You're correct though. I should have made that clearer. – hepcat72 Aug 11 '21 at 14:15
Notice the thing about the eof(STDIN) you were also mentioning in your Q; eof(STDIN) actually reads from STDIN, and it may read everything from it ;-) – user486445 Aug 11 '21 at 14:19
Hmmm... that's a good point. I thought it ungetc's the character it read (according to perldoc -f eof) though this may be one of those warning cases it alludes to? – hepcat72 Aug 11 '21 at 14:25
Regardless, I added the eof call to debug. It's temporary. – hepcat72 Aug 11 '21 at 14:26
1

The stdio getc (or its perl equivalent, perl does its own buffering i/o via perlio, it no longer uses stdio) cannot put a character back in the file. If stdin is seekable, it may try to seek back, but if it's a pipe or tty, it absolutely cannot do anything like it. Try my first example with a seekable stdin. perl -e '...' <<<text instead of echo text | perl -e '...' to see the difference. – user486445 Aug 11 '21 at 14:31
Cool. I will check it out. Currently I'm marching through @Stéphane Chazelas suggestions, trying to pick up a thing or two. And my wife is bugging me about being antisocial on vacation. ;) I'll get around to it. I'm sure I'll have questions later. – hepcat72 Aug 11 '21 at 14:36

hepcat72 · Answer 3 · 2021-08-17T20:31:48.000

The answers thus far answer the question, but so far, the specific question about when and why a child perl process has a file named AppendToPath open for reading on file descriptor 0 (STDIN) has not been directly addressed. The comments below by @zevzek elucidate what is likely happening. I am going to leave the selected answer, because it explains how things work, and provide a mechanism that explains how STDIN ends up being a file handle to something other than standard input, but I am going to put it in the context of the AppendToPath file in my case, with a reproducible (on macOS using the system perl) example.

Though there is no way to know how a file descriptor was "created" -- the system keeps no history about it. If you're debugging, it helps a lot tracing your program, as with strace -f ./your_script.

Given the quote above, we don't know whether it's the parent or child perl process that opened AppendToPath, but given that AppendToPath is a file used to update perl's @INC, which is needed early - it was likely opened by the perl interpreter of the child to prepare to run the supplied script.

Here is a toy example where the parent closes STDIN and the child's STDIN (fd 0) turns out to be AppendToPath.

bash-3.2$ perl -e 'close(STDIN); \
                   $c=q{perl -e } . \
                      chr(39) . \
                      print(fileno(STDIN),"\n"); \
                      q{print qx{lsof -w -b -p $$}} . \
                      chr(39); \
                   print `$c`'
0
COMMAND   PID     USER   FD   TYPE             DEVICE SIZE/OFF                NODE NAME
perl5.18 8901 robleach  cwd    DIR                1,8      832            35897246 /Users/robleach/GoogleDrive/WORK/RPST
perl5.18 8901 robleach  txt    REG                1,8    37552 1152921500311880916 /usr/bin/perl5.18
perl5.18 8901 robleach  txt    REG                1,8  1305808 1152921500312070866 /System/Library/Perl/5.18/darwin-thread-multi-2level/CORE/libperl.dylib
perl5.18 8901 robleach  txt    REG                1,8  1568368 1152921500312405021 /usr/lib/dyld
perl5.18 8901 robleach    0r   REG                1,8       33            90514450 /Library/Perl/5.18/AppendToPath
perl5.18 8901 robleach    1   PIPE 0xd289f4ba11f1bbb8    16384                     ->0x4d76dba4a1ac82fd
perl5.18 8901 robleach    2u   CHR               16,0  0t13390                 723 /dev/ttys000
perl5.18 8901 robleach    3   PIPE 0x9f2f7b3ec7eb66ba    16384                     ->0xc303b3e01efc707c

and this toy example does not close STDIN showing that fd 0 is a tty (i.e. STDIN).

bash-3.2$ perl -e '$c=q{perl -e } . \
                      chr(39) . \
                      print(fileno(STDIN),"\n"); \
                      q{print qx{lsof -w -b -p $$}} . \
                      chr(39); \
                   print `$c`'
0
COMMAND   PID     USER   FD   TYPE             DEVICE SIZE/OFF                NODE NAME
perl5.18 8904 robleach  cwd    DIR                1,8      832            35897246 /Users/robleach/GoogleDrive/WORK/RPST
perl5.18 8904 robleach  txt    REG                1,8    37552 1152921500311880916 /usr/bin/perl5.18
perl5.18 8904 robleach  txt    REG                1,8  1305808 1152921500312070866 /System/Library/Perl/5.18/darwin-thread-multi-2level/CORE/libperl.dylib
perl5.18 8904 robleach  txt    REG                1,8  1568368 1152921500312405021 /usr/lib/dyld
perl5.18 8904 robleach    0u   CHR               16,0  0t14630                 723 /dev/ttys000
perl5.18 8904 robleach    1   PIPE 0xd289f4ba11f1bbb8    16384                     ->0x4d76dba4a1ac82fd
perl5.18 8904 robleach    2u   CHR               16,0  0t14630                 723 /dev/ttys000
perl5.18 8904 robleach    3   PIPE 0x9f2f7b3ec7eb66ba    16384                     ->0xc303b3e01efc707c

Note that without the close of STDIN, the file /Library/Perl/5.18/AppendToPath is not included in the lsof output. It's also notable that the STDIN file descriptor is defined when fileno(STDIN) is queried in the child.

The following is my own rephrasing (an edited quote) of @zenzek:

Quoting from the open(2) manpage: "The file descriptor returned by a successful call will be the lowest-numbered file descriptor not currently open for the process.". After you close STDIN (fd 0), the next call to open will get that file descriptor (0) and things like -t STDIN in a child process will be that newly opened file. Here is an example:

perl -le 'close(STDIN); \
          open PW, "/etc/passwd"; \
          print fileno(PW); \
          print `perl -e "print fileno(STDIN),qq(\n)";lsof  -w -b -p $$`'
0
0
COMMAND    PID     USER   FD   TYPE             DEVICE SIZE/OFF                NODE NAME
perl5.18 13320 robleach  cwd    DIR                1,8      832            35897246 /Users/robleach/GoogleDrive/WORK/RPST
perl5.18 13320 robleach  txt    REG                1,8    37552 1152921500311880916 /usr/bin/perl5.18
perl5.18 13320 robleach  txt    REG                1,8  1305808 1152921500312070866 /System/Library/Perl/5.18/darwin-thread-multi-2level/CORE/libperl.dylib
perl5.18 13320 robleach  txt    REG                1,8  1568368 1152921500312405021 /usr/lib/dyld
perl5.18 13320 robleach    0r   REG                1,8     6946            90523670 /private/etc/passwd
perl5.18 13320 robleach    1u   CHR               16,0  0t23169                 723 /dev/ttys000
perl5.18 13320 robleach    2u   CHR               16,0  0t23169                 723 /dev/ttys000
perl5.18 13320 robleach    3   PIPE 0x5898fd9b21b123d7    16384                     ->0xc4c5f2aecc8eaaf7

More relevant quotes:

File descriptor 0 is the stdin. It may not be the stdio's stdin stream object, or the perl's STDIN file-handle object (both higher lever wrappers, which may not refer to any actual file or file descriptor at all). But it's always the file descriptors which are inherited through exec, not any higher lever wrappers. Which is what happens when you run lsof or any other program via backquotes (unless the fd is marked with cloexec, which the standard fds should not be). Any file descriptor (including 0) is always inherited if it's opened, but not if it's closed. If fd 0 is closed, any function which returns a fd (open(), accept(), socket(), epoll_create(), etc) will return 0 if successful, since 0 is the lowest-numbered fd not currently used.

After closing STDIN, I can reproduce an arbitrary file open in the parent being given fd 0, but I cannot reproduce an arbitrary file open in the child being on fd 0 (because I suspect the open of AppendToPath happens even before the BEGIN block in the child script). I can reproduce (above) the AppendToPath file being given fd 0. I just can't definitively determine whether it's the parent or child that's opening it. But I believe it's a reasonable guess to say that it's being open by the perl interpreter in the child process.

The second answer changed since the last time I read it. Though it would be nice if it didn't start with an incorrect assumption stating that I didn't know the child inherits the parents handles (which I did/do. If you could remove that, I'd be happy to select it as the answer. — hepcat72, Aug 13 '21 at 10:46
And technically, your answer starts with a comment about "rightly" detecting input on STDIN, which turned out to not be STDIN - it was as you point out later, something else that had opened with fd 0 because STDIN was closed. Like your example with /etc/password. THAT wasn't STDIN. It's just another file handle (PW) that got the 0 file descriptor. — hepcat72, Aug 13 '21 at 11:32
OK. So here's a question. I assume that F'd 0 in the child was created by the child and not inherited (because the parent closed STDIN). Is that correct - or is there a way to know that? — hepcat72, Aug 13 '21 at 12:00
Given that the file that was on fd 0 had to do with perl's @INC, do you think it would be a reasonable guess that the Perl interpreter of the child opened it? — hepcat72, Aug 13 '21 at 12:06
To rephrase - are you sure that fd 0 is always inherited - even if the parent closes it? I'm just curious why the parent interpreter would open that specific file after the (parent) body code was run. The interpreter needs @INC earlier than that, which is why I assumed the child opened it and (receiving fd 0 because it wasn't inherited). — hepcat72, Aug 13 '21 at 12:23
Thanks for helping me understand. Your patience was greatly appreciated. — hepcat72, Aug 13 '21 at 12:36

Edge case - detecting input on STDIN in perl

3 Answers3