Reading from file descriptor fails

Question

This question is about reading and writing on a file descriptor. See the following example:

#!/bin/sh

file='somefile'

# open fd 3 rw
exec 3<> "$file"

# write something to fd 3
printf "%s\n%s\n" "foo" "bar" >&3

# reading from fd 3 works
cat "/proc/$$/fd/3"

# only works if the printf line is removed
cat <&3

exit 0

This script outputs:

foo
bar

Expected output:

foo
bar
foo
bar

Opening and writing to the file descriptor succeeds. So does reading via proc/$$/fd/3. But this is not portable. cat <&3 doesn't output anything. However, it works when the file descriptor is not being written to (e.g. uncomment the printf line).

Why doesn't cat <&3 work and how to read the entire contents from a file descriptor portably (POSIX shell)?

After printf "%s\n%s\n" "foo" "bar" >&3, fd 3 has a read/write pointer at byte 8 of the file. Do you want cat <&3 to start reading from the beginning of the file rather than from byte 8? — Mark Plotnick, Feb 04 '15 at 20:37

score 7 · Accepted Answer · edited Apr 13 '17 at 12:37

cat <&3 does exactly what it's supposed to do, namely read from the file until it reaches the end of the file. When you call it, the file position on the file descriptor is where you last left it, namely, at the end of the file. (There's a single file position, not separate ones for reading and for writing.)

cat /proc/$$/fd/3 doesn't do the same thing as cat <&3: it opens the same file on a different descriptor. Since each file descriptor has its own position, and the position is set to 0 when opening a file for reading, this command prints the whole file and doesn't affect the script.

If you want to read back what you wrote, you need to either reopen the file or rewind the file descriptor (i.e. set its position to 0). There's no built-in way to do either in a POSIX shell nor in most sh implementations (there is one in ksh93). There is only one utility that can seek: dd, but it can only seek forward. (There are other utilities that may skip forward but that doesn't help.)

I think the only portable solution is to remember the file name and open it as many times as necessary. Note that if the file isn't a regular file, you might not be able to seek backwards anyway.

score 6 · Answer 2 · edited Feb 04 '15 at 22:32

6

#!/bin/sh

exec 3>file
exec 4<file

printf "%s\n" "foo" "bar" >&3
cat <&4

By using separate file descriptors for reading and writing you get separate positions in the file. Writing doesn't change the reading position.

edited Feb 04 '15 at 22:32

Marco

33,548

answered Feb 04 '15 at 21:59

Hauke Laging

90,279

But file is opened twice, so that you may have a race condition. I don't know whether this matters for the OP. But I wonder if there is a way to change the reading position (i.e. do a seek), possibly with some shell extension. – vinc17 Feb 04 '15 at 22:02
@vinc17 Why should there be a race condition with two descriptors but not with one? Problems arise when the application does internal buffering but if data is written to the kernel then the page cache takes care that every process has the correct view of the data. – Hauke Laging Feb 04 '15 at 22:07
2

Add a rm file between both exec lines. You'll get an error because the filename no longer exists. This is not the case with the non-portable way using cat "/proc/$$/fd/3", because once a file is open, it is not removed by rm (it will be removed once all the instances are closed). – vinc17 Feb 04 '15 at 22:23

vinc17 · Answer 3 · 2015-02-05T00:47:27.040

4

With ksh93, it is possible to seek:

#!/usr/bin/env ksh93
file='somefile'
exec 3<> "$file"
printf "%s\n%s\n" "foo" "bar" >&3
cat "/proc/$$/fd/3"
exec 3>#((0))
cat <&3
exit 0

I get:

foo
bar
foo
bar

as wanted.

edited Feb 05 '15 at 00:47

answered Feb 04 '15 at 22:12

vinc17

12,174

Although this might work, I can't expect ksh93 to be installed on every machine. Thus, this is not really portable. – Marco Feb 04 '15 at 23:51
I'd replace #!/bin/ksh93 with #!/usr/bin/env ksh93, you never know where the binary is located. On my system it's /usr/local/bin/ksh93, for instance. – Marco Feb 04 '15 at 23:58
@Marco OK, I've edited my answer to use #!/usr/bin/env ksh93. Concerning the portability, this depends on the context: a user or admin can generally install ksh93, while if the /proc file system is not provided by the OS, there isn't much one can do. BTW, one can hope that other shells support the ksh93 seek feature in the future. My RFE for zsh: http://www.zsh.org/mla/workers/2015/msg00386.html – vinc17 Feb 05 '15 at 00:52
1

while perhaps not as convenient, the script need not necessarily be executed by ksh to take advantage of ksh's awesome i/o handling. ksh -c '3>#((0))' should do in a pinch. – mikeserv Feb 05 '15 at 02:21
@mikeserv Or ksh93 -c '3>#((0))' because on some machines, ksh is another ksh implementation (e.g. mksh). Now, the best solution for the future would be to make other shells (bash and zsh, in particular) implement this feature. – vinc17 Feb 05 '15 at 02:44
1

You know what just occurred to me? What does one of those other shells do if, while reading a script on stdin like sh -s <script the command ksh -c '>#((0))' is read? Is that a hacky little goto? – mikeserv Feb 05 '15 at 02:56
1

@mikeserv Don't you mean redirecting stdin with <#((0)) instead of >#((0))? Like this: zsh -c "sh -s <=(echo \"echo OK; ksh -c '<#((0))'\")" – vinc17 Feb 05 '15 at 08:59
1

@mikeserv Of course, I've tried. This outputs OK endlessly. – vinc17 Feb 05 '15 at 10:29

score 2 · Answer 4 · answered Feb 04 '15 at 23:23

2

You need lseek function to reposition file offset. It is not implemented in bash, so you need to write a basic program in C for that. An example of it, and some discussion can be found on bash mailing list:

int main(int argc, char * argv[])
{
    return lseek(atoi(argv[1]), 0L, 0);
}

Of course in such simple scenario as in question you can always just reinitialize descriptor

exec 3<> file

after printf, but that is obviously not general solution.

answered Feb 04 '15 at 23:23

jimmij

47,140

Requiring to compile a C program makes this solution not portable. Although this might work in theory, it's not a practical solution. – Marco Feb 04 '15 at 23:55
@Marco Perl is installed on most machines (if not all). So, you can use: perl -MPOSIX -e 'lseek(3,0,0)' – vinc17 Feb 05 '15 at 00:57
@Marco - compiling a C program should be portable to any POSIX machine - more so even than Perl. Why would you suggest it is not portable? – mikeserv Feb 05 '15 at 01:11
@mikeserv I think that the reason is that not all machines have a C compiler installed. Concerning perl, all Debian machines have it installed since it has standard priority. – vinc17 Feb 05 '15 at 01:16
@vinc17 - it's my understanding that a basic C compiler is part of the definition of a Unix system. Maybe I'm mistaken - but c99 is spec'd. Still, i'd be happy to know better if you do. I sometimes just use cc and a heredoc or a pipe to compile little utilities on the fly from stdin and run them. It's what I mean to do with this little cherry later (thanks jimmij). – mikeserv Feb 05 '15 at 01:44
1

@mikeserv Most systems are not POSIX conforming systems by default, and in particular, some utilities may be missing. This may be the case of c99 (and pax, which is even less likely to be installed). On a machine where I have an account: -bash: c99: command not found And c99 isn't necessarily ISO C99 conforming in practice (in particular concerning floating point). – vinc17 Feb 05 '15 at 02:14
@mikeserv What you're saying is complete non-sense. It is a fact that most systems are not POSIX-conformant. If you disagree with this choice, you should report bugs. Perl is rather stable, in particular on something as simple as a lseek. End of discussion. – vinc17 Feb 05 '15 at 02:36

mikeserv · Answer 5 · 2015-02-05T04:23:11.440

If you are doing some work on a file descriptor that you expect you'll want to read again, then you can portably do it in the body of a here-document:

exec 4<somefile 3<<plus
$(    printf %s\\n some random lines
        cat <&4
)
plus
cat <&3

The solution is not perfect perhaps - there is no portable way, for example, to seek back through the here document file descriptor. And the command substitution will elide trailing blank lines from whatever cat writes to stdout, but all of these problems are very simply handled.

In the first place, redirections are scoped to their containing compound command, and so it is a small matter to nest these a bit in a loop to handle any repetition you might require. It is also possible to implement a loop within the command sub itself - or even to call out to an interactive shell on /dev/tty if you want. But most generally I prefer to link a heredoc to a function definition.

fn() { : do something w/ fd3
} 3<<INPUT
$1
$(:gen some output;:maybe cat or head <stdin)
INPUT
while IFS= read -r line; do fn "$line"; done

It is also possible to read one heredoc another writes.

cat 4<<SAVED <<AND
$(  cat)
SAVED
$(  printf %s\\n some random lines
     cat <&4)
AND

Or you could loop like...

until test && cat <&4
do exec 3<&4 4<<IN
$(: gen output; cat <&3)
IN
done 4<infile

And of course you could call something like the fn above from within as well.

And regarding the trailing blank line problem - if it is a problem - then you can simply echo . at the tail end of the command sub, then make sure you strip the last line from the fd when you read it like sed \$d <&"$fd".

But, while pretty secure, looping over files like this in the shell is usually a bad idea anyway - notice there is at least a fork per generated file? Shells assign fds and utilities handle them - do your loop in a sed or awk script and manipulate the file data w/ a standard utility then use the shell to direct it on output - that is usually best practice.

Reading from file descriptor fails

5 Answers5

Linked