Why do we have to pass the file name twice in exec functions?

Question

I read Advanced Programming in the UNIX Environment by Stevens, 8^th chapter. I read and understand all the six of exec functions.

One thing I notice is, in all the exec functions:

first argument is the file name / path name (depends on the exec function).
second argument is argv[0] that we get in main(), which is the file name itself.

So here we do have to pass the file name twice in the function.

Is there any reason for it (like we cannot get the file name from the path name from the first argument)?

goldilocks · Answer 1 · 2015-03-02T12:58:57.920

So here we do have to pass the file name twice in the function.

They are not quite the same thing as you notice by observing that one of them is used as the argv[0] value. This doesn't have to be the same as the basename of the executable; many/most things ignore it and you can put whatever you want in there.

The first one is the actual path to the executable, for which there is an obvious necessity. The second one is passed to the process ostensibly as the name used to invoke it, but, e.g.:

execl("/bin/ls", "banana", "-l", NULL);

Will work fine, presuming /bin/ls is the correct path.

Some applications do, however, make use of argv[0]. Usually these have one or more symlinks in $PATH; this is common with compression utilities (sometimes they use shell wrappers instead). If you have xz installed, stat $(which xzcat) shows it's a link to xz, and man xzcat is the same as man xz which explains "xzcat is equivalent to xz --decompress --stdout". The way xz can tell how it was invoked is by checking argv[0], making these equivalent:

execl("/bin/xz", "xzcat", "somefile.xz", NULL);
execl("/bin/xz", "xz", "--decompress", "--stdout", "somefile.xz", NULL);

Ah, so this would explain how busybox can be what you want it to be depending on how you call it right? — terdon, Mar 02 '15 at 18:09
@terdon that's exactly how the single binary for busybox satisfies so many different commands. — mah, Mar 02 '15 at 19:10
Which would mean that if /bin/ls was busybox, it wouldn't know how to execute banana! — Riking, Mar 02 '15 at 19:30

score 13 · Answer 2 · answered Mar 02 '15 at 11:54

13

You don't have to pass the file name twice.

The first one is the file that is actually exec'ed.

The second argument is what should be the argv[0] of the process, i.e. what the process should see as its name. E.g. if you run ls from the shell, the first argument is /bin/ls, the second is just ls.

You can exec a certain file and call it something else via the second argument; the program can check its name and behave differently according to the name. This can also be done via hard links (or symbolic links) but this way gives more flexibility.

answered Mar 02 '15 at 11:54

wurtel

16,115

In fact links are the same method since that sets argv[0] to the link name. – goldilocks Mar 02 '15 at 11:57
In the last paragraph , "You can exec a certain file and call it something else via the second argument; the program can check its name and behave 'differently' according to the name". can you please elaborate or give me some readings , I an new to this environment. – munjal007 Mar 02 '15 at 12:37
The last part of goldilocks' answer explains this. – wurtel Mar 02 '15 at 13:13

score 3 · Answer 3 · answered Mar 02 '15 at 14:05

3

The takeaway is that argv[0] can be set to anything (including NULL). By convention, argv[0] will be set to the path the executable was started as (by the shell process when it does the execve()).

If ./foo and dir/bar are two different links (hard or symbolic) to the same executable, then starting the program from the shell using the two paths will set argv[0] to ./foo and dir/bar, respectively.

The fact that argv[0] can be NULL is often overlooked. The following code might crash for a NULL argv[0] for example (though glibc prints something like <null> instead for argv[0]):

if (argc != 3) {
    fprintf(stderr, "%s: expected 2 arguments\n", argv[0]);
    exit(EXIT_FAILURE);
}

An alternative on Linux is to use /proc/self/exe for such cases.

answered Mar 02 '15 at 14:05

Ulfalizer

131

how can you set argv[0] to both ./foo and dir/bar – munjal007 Mar 02 '15 at 16:34
@munjal007 I'm sorry if I was being unclear. I meant running the program twice: once as ./foo and once as dir/bar. argv[0] will be different for those two cases (in each case it'll be the same as the path you used). – Ulfalizer Mar 02 '15 at 16:39
@munjal007 That's assuming that you run it from the shell of course. The point is that you could set argv[0] to anything when you exec*() the program yourself. It's a convention of the shell to set argv[0] to the path that was used to start the program (and it's wise to do the same when you exec*() a program, since many programs inspect argv[0] and expect it to hold the path). – Ulfalizer Mar 02 '15 at 16:43

Why do we have to pass the file name twice in exec functions?

3 Answers3

Linked