shebang or not shebang

Question

I want to use a program in the shebang, so I create a script named <myscript> with:

#!<mypgm>

I also want to be able to run <mypgm> directly from the command prompt.

<mypgm> args...

So far, no issue.

I want to be able to run <myscript> from the command prompt with arguments.

<myscript> blabla

In turn, the shebang makes <mypgm> being called with the following arguments:

<mypgm> <myscript> blabla

Now, I need to know when <mypgm> <myscript> blabla is called using the shebang, or not:

myscript blabla # uses the shebang
-or-
<mypgm> myscript blabla   # directly in the command prompt.

I looked at the environment variables (edit: <=== wrong assertion (¬,¬”) ), at the process table (parent process too) but didn't find any way to make a difference.

The only thing I found so far is:

grep nonvoluntary_ctxt_switches /proc/$$/status

When this line is just after the shebang, the value is often 2 (sometimes 3) when called through the shebang, and 1 (sometimes 2) with the direct call. Being unstable and dependent on process scheduling (the number of times the process was taken off from its CPUs), I am wondering if anybody here might have a better solution.

The meaning of the first parameter is different when it is called from the command prompt (anything), or provided by the shebang (the script filename). Testing if the first argument is an executable file is not safe enough. — Jacques, May 06 '19 at 11:15
The meaning of the first argument is different when it is called from the command prompt (anything), or provided by the shebang (the script filename). Testing if the first argument is an executable file is not safe enough — Jacques, May 06 '19 at 11:25
This ^^^ should be the main part of your question. Your question as it is now tries to solve Y while you want to solve X. — pLumo, May 06 '19 at 11:32
I agree. I tried to find a solution but didn't take enough time to step back. To my defense, this is my first question here ;-) — Jacques, May 06 '19 at 11:39
You question currently says that you want to distinguish between myscript blabla and mypgm myscript blabla, but I take it from the comments that what you really want to distinguish between is mypgm myscript blabla (potentially as a result of the shebang) and mypgm otherargs, is that right? — Stephen Kitt, May 06 '19 at 12:12
If I understand you well: no. In "mypgm myscript blabla", myscript might be an argument that can be by (lack of) luck be the name a script too, but have a completely different meaning. While in "myscript blabla" the shebang mechanism provides "myscript" as the name of the script of mypgm, and that is the intention. — Jacques, May 06 '19 at 12:18
Except that from mypgm’s perspective, myscript blabla ends up being mypgm myscript blabla, so I don’t understand what distinction you’re trying to make. — Stephen Kitt, May 06 '19 at 12:24
Both indeed are ending up to "mypgm myscript blabla", and that is exactly the issue I have: I need to know if it was invoked using a shebang (i.e. from a file), or directly from the command prompt (without any file in the process.
Small theoretical example. Let say I want to create a calculator. I may call it from the command prompt with: "calc PI + 1". 3 arguments. Now, I start supporting shebang to create files with calculations. So now, by lack of luck, if file PI does exist, from the command prompt, "calc PI + 1" will try to interpret what in file PI. — Jacques, May 06 '19 at 12:33
I need to know if it was invoked using a shebang (i.e. from a file), or directly from the command prompt (without any file in the process. Ummm, why?!?! What difference does it make how your program is started? That's the real problem you need to solve. — Andrew Henle, May 06 '19 at 12:36
Small theoretical example. Let say I want to create a calculator. I may call it from the command prompt with: "calc PI + 1". 3 arguments. Now, I start supporting shebang to create files with calculations. So now, by lack of luck, if file PI does exist, from the command prompt, "calc PI + 1" will try to interpret what in file PI — Jacques, May 06 '19 at 12:43
So you want to disallow mypgm myscript blabla (run as such explicitly) while allowing myscript blabla (with the shebang), is that right? — Stephen Kitt, May 06 '19 at 12:45
No, I want both ;-) for versatility, but I want to know how it was called. — Jacques, May 06 '19 at 12:48
So what practical difference is there between myscript blabla and mypgm myscript blabla for you? How do you distinguish between mypgm myscript blabla and mypgm otherargs? I’m trying to understand what you’d do with the information you’re asking for, once you have it. — Stephen Kitt, May 06 '19 at 13:08
In the calc theoretical example above. "calc PI + 1" should return 4.14159... Now adding the support for the shebang (i.e. a filename as the first parameter) would return the calculation contained into the file. Calling from the command prompt "calc PI + 1" (space between arguments) would then try to open file PI and look for calculation inside, which is not what is intended. One could of course look if file PI exists and is executable, but this is not a bulletproof workaround. Adding nonvoluntary_ctxt_switches test as mentioned above reduces (sharply) the risk, but only reduces it. — Jacques, May 06 '19 at 13:24
Forget the shebang for a moment. If you want to allow calc myscript blah, how are you going to differenciate between calc PI where PI is a script, and calc PI + 1? (This is why most tools use an option for scripts, e.g. awk -f myscript.) — Stephen Kitt, May 06 '19 at 13:38
Can you not test for a tty? - If it's ran from the console then you'll have a interactive tty to write back to, otherwise you won't. I'm not actually sure if this test is useful, but I know a number of programs test for this... — djsmiley2kStaysInside, May 06 '19 at 16:30
@Jacques In your theoretical example, whether the command line began with calc or myprog does not change the (non)existence of a file named "PI". Either way, the behavior of the command is conditional upon whether the file exists or not, and as Stephen suggests, it's far better to explicitly do calc -f PI where the calculation is stored in a file named PI instead of on the command line. — Monty Harder, May 06 '19 at 21:54

score 21 · Answer 1 · edited May 06 '19 at 14:07

21

Instead of having myprg magically detect whether it is being used in a shebang, why not make that explicit by using a command-line flag (such as -f) to pass it a file as a script?

From your example in the comments:

In the calc theoretical example above. calc PI + 1 should return 4.14159... Now adding the support for the shebang (i.e. a filename as the first parameter) would return the calculation contained into the file.

Make calc take a script file through -f and then create scripts with:

#!/usr/local/bin/calc -f
$1 + 1

Let's say you call this file addone.calc and make it executable. Then you can call it with:

$ ./addone.calc PI
4.141592...

That call will translate into an invocation of /usr/local/bin/calc -f ./addone.calc PI, so it's pretty clear which argument is a script file and which is a parameter to the script.

This is similar to how awk and sed behaves.

A similar (but opposite) approach is to have calc take a script file argument by default (which simplifies its use with a shebang), but add a command-line flag to use it with an expression from an argument. This is similar to how sh -c '...' works.

edited May 06 '19 at 14:07

Kusalananda

333,661

answered May 06 '19 at 13:43

filbranden

21,751
4
63
86

I already thought to this solution. Unfortunately, I know it is not perfect a all but in my environment, I need a shebang a env: "#!/bin/env calc". In that situation, it ends up that adding an argument make the shebang combine try to call calc\ -f, and not calc with option -f. – Jacques May 06 '19 at 18:39
1

In many environments, shebang lines are quite limited in terms of the number of arguments they can take. If you're trying to use the environment (#!/usr/bin/env) then if you want your script to be portable you can't add command-line arguments. If you want something more complex than to simply specify one single interpreter, you're abusing the shebang and may or may not get the results you're looking for and will quite likely lose portability. – Scott Severance May 06 '19 at 20:15
1

If you want to use the shebang with env, then turn it around and make the script file execution the default and use a command-line flag for passing an expression inline from the arguments. There is really no reliable way to find whether your program is being called from a shebang. IMO, that's by design, so we shouldn't be trying to go around that. – filbranden May 06 '19 at 22:11
@ScottSeverance , you'd have to be using a really old Unix to be limited in any practical way in the number of arguments you could pass to the interpreter on the shebang line. By (partial) coincidence, I was reading just this morning about the technical limitations of the shebang line on this site: #! magic ... 127 bytes for the shebang line seems to be the minimum these days. – Todd Walton Oct 06 '21 at 14:27
1

@todd: You might be right, but I don't know for sure. What I do know is that my web hosting company runs FreeBSD and a whole lot of things I take for granted on Linux don't work there. I haven't tested the shebang limits specifically, but I've learned that in many cases what seems more modern to me turns out to be limited to Linux. – Scott Severance Oct 06 '21 at 16:37

alexis · Answer 2 · 2019-05-07T09:26:58.543

The real problem is the way you designed the commandline syntax of <mypgm>. Instead of trying to support two ways of interpreting its arguments, provide two ways of calling it instead.

Shebang commands are meant to be script engines that execute the content of your script; it might be bash, perl, or whatever, but the expectation is that it is called with the file name of a script to execute. How does bash do it? It does not guess. If it encounters any argument that does not look like an option (or an option's argument), it treats it as the script to execute; arguments after that are passed to the script. For example:

/bin/bash -x -e somename foo bar

Here, bash will look for the file somename and try to run it as a script with arguments foo and bar. You should do the same thing, because you might want to write <mypgm> <myscript> on the command line some day.

If you want the script-less use of <mypgm> to be the default, you can require a script to be passed with <mypgm> -f <myscript>. This is how sed does it. Then you'd use it in a shebang line like this:

#!<mypgm> -f

If you want the script case to be the default, like with bash and perl, create an option that says "there is no script this time". You could use -- for this, so that <mypgm> -- one two three does not try to run one (or anything else) as a script. In that case the shebang line would just read:

#!<mypgm>

score 5 · Answer 3 · edited Jun 11 '20 at 14:16

5

Now, I need to know when blabla is called using the shebang, or not:

In C, you can obtain that info via getauxval(AT_EXECFN), which will tell you the name of the original executable (ie the first argument passed to execve(2)) [1].

But that string is placed in the memory immediately after the command line arguments and environment strings, at the end of the [stack] memory region, so it can be fetched directly from there.

For instance, the following perl script (name it foo.pl), if made executable with chmod 755 foo.pl, will print ./foo.pl when run directly and /usr/bin/perl when run as perl ./foo.pl:

#! /usr/bin/perl
open my $maps, "/proc/self/maps" or die "open /proc/self/maps: $!";
my $se;
while(<$maps>){ $se = hex($1), last if /^\w+-(\w+).*[stack]$/ }
open my $mem, "/proc/self/mem" or die "open /proc/self/mem: $!";
sysseek $mem, $se - 512, 0;
sysread $mem, $d, 512 or die "sysread: $!";
print $d =~ /([^\0]+)\0+$/, "\n";

On newer (>=3.5) linux kernels the end of the environment is also available in /proc/PID/stat (in the 51th field, as documented in the proc(5) manpage).

#! /usr/bin/perl
open my $sh, "/proc/self/stat" or die "open /proc/self/stat: $!";
my @s = <$sh> =~ /(.*)|\S+/g;
open my $mem, "/proc/self/mem" or die "open /proc/self/mem: $!";
seek $mem, $s[50], 0;
$/ = "\0";
my $pn = <$mem> or die "readline: $!"; chomp $pn; print "$pn\n";

[1] Linux kernels newer than 2.6.26 introduced the aux vector entry pointing to it (see the commit), but the executable name was available at the end of the stack long before that (since linux-2.0 from 1996).

edited Jun 11 '20 at 14:16

Community

1

answered May 06 '19 at 12:31

Size of the array is 44 for me, not 51 or more. :-( – Jacques May 06 '19 at 12:49
what system are you running on (uname -a). The format of /proc/PID/stat on linux hasn't changed since a long time. – May 06 '19 at 12:50
Linux xxxxxxxxxxx.com 2.6.32-573.el6.x86_64 #1 SMP Thu Jul 23 15:44:03 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux – Jacques May 06 '19 at 12:56
It's at the end of the stack, anyway. You can get the end of the stack from /proc/PID/maps. Very gross example: off="0x$(awk -F'[- ]' '/\[stack\]/{print$2}' /proc/$$/maps)"; dd bs=1 if=/proc/$$/mem skip=$((off - 64)) status=none count=64 | perl -nle 'print /.*\0([^\0]+)\0*$/' – May 06 '19 at 13:16
Maybe that's because I'm not admin? – Jacques May 06 '19 at 13:26
No, it's because the kernel is too old. Those /proc/PID/stat fields are not present in Linux 2.6. You can combine maps + mem as in the comment above, but I'm not able to test or improve that now. – May 06 '19 at 14:24
@Jacques: I wouldn't admit on the Internet I was running a 4-year-old build of a Linux kernel on a machine connected to the Internet! Spectre and Meltdown weren't discovered yet. (Although to be fair, those are local info-leak exploits, someone would need a way to run code on your desktop, e.g. if you run a web browser that JITs JavaScript.) And 2.6.32 was originally released in 2009. That's ancient. Sure EL6 backports some patches, but you miss out on new features when running crusty old software. – Peter Cordes May 06 '19 at 21:34
@PeterCordes There are a lot of embedded systems which are not upgradable, but still used and useful (come and pown my dashcam). And btw, only a tiny minority of linux systems are running x86 and are affected by fashionable bugs like spectre and meltdown. – May 07 '19 at 00:35
@mosvy: el6.x86_64 definitely identifies it as x86-64. Maybe AMD or some other non-Intel vendor so not Meltdown, but in-order x86-64 CPUs (which wouldn't be vulnerable to Spectre) are very rare. Spectre is also a thing on out-of-order ARM or MIPS CPUs; only Meltdown is specific to Intel microarchitectures. And BTW, maybe you mean a tiny minority of embedded Linux systems run on x86? If we count only Linux systems that people use interactively via a command shell and ask questions like this about, most are x86. – Peter Cordes May 07 '19 at 01:05
1

@Jacques I've updated with a /proc/self/maps version (the 1st script) which should work with older kernels. This is not easily doable in the shell, because older kernels won't let you access the memory of another process, unless you've ptrace-d it first, and the shell has no builtin command which could be used to read /proc/self/mem (and running dd will have to fork a separate process). – May 07 '19 at 04:37

score 2 · Answer 4 · answered May 07 '19 at 01:32

You can have regular calc program with usage like calc PI + 3 (and as extension calc -f script_file_name).

For using in shebang create link (only hard links works if I recall correctly) named eg. calcf and then in calc program check executable name (for C/C++ look at argv[0] in function main). You have now #! /some/path/calcf in scripts.

That way you avoids using options like -c on command line (3 keystrokes saved) and you don't need options in shebang (that may be problematic as of Scott's comment shebang or not shebang).

I sense the kernel of an answer here, but it’s wrapped in confusion. Can you please [edit] this to make it clearer how you are addressing the question? — Scott - Слава Україні, May 07 '19 at 02:01

Jacques · Accepted Answer · 2019-05-10T15:00:01.330

Just realized that the following environment variable does it all: $_

When launched using <myscript>, its value is './<myscript>'

When launched using <mypgm> <myscript> its value is the full path to <mypgm>.

That simple, in my case:

#!/bin/bash

how_called=$_

if [[ "X$how_called" == X$0 || "X$how_called" ==X$BASH ]]; then
#                              ^in this case, if the login shell is not bash
   shebang=0
else
   shebang=1
fi

bn=$(basename $0)

A bit later (for my purpose):

if (( shebang == 1 )) || [[ ! -z $1 && "X$1" != X-* && "X$1" == X*\.${bn:0:3} && -x $1 ]]; then 
   # ^ shebang: first argument is the script file
   #                        ^ or not shebang: first argument **may** be a script file name
   #                                                    ^ ensure that this is a script by script extension
   #                                                       (otherwise just use the more verbose but standard --script=...)

   shebang_fn="$1"
   shift 1
   set -- --script="$shebang_fn" "$@" # fall back on standard way.
fi

(I know that I'm flipping the table a bit here, and that we still have to ensure that this is a portable solution).

Jacques · Answer 6 · 2019-05-07T11:34:44.183

I may have found something closer to the solution:

cat /proc/$$/cmdline | tr '\0' '\n'

Called from command prompt, output is:

bash
<path to mypgm>
<myscript>

Called from shebang:

bash
<path to mypgm>
./<myscript>

Third line is different.

The sort of solution below is still heuristic, not a final one. Indeed, as several people mentioned here, playing with specific option either in shebang mode (-f ) or in command line mode (-c ...) would be 100% bullet proof (the hard link solution too).

However, this solution wouldn't be satisfying for the situation encountered. Moreover, both solutions are NOT mutually exclusive. You could either rely on the heuristic below in absence of -f and -c options, or on the bullet proof solution in case you're using one of them.

Heuristic equivalent in bash.

    #!/bin/bash

    # first statement for more accuracy
    shebang=$(</proc/$$/status)

    shebang=$(echo "${shebang// /\n}" \
       | awk '$1 ~ /^nonvoluntary_ctxt_switches:$/ {print $2}')

    maybe_file=$(awk -v RS="\0" 'NR==3' /proc/$$/cmdline)

    if    [[ "X$maybe_file" == X*/* ]] \
       && [[ maybe_file != X-* ]] \
       && (( shebang > 1 )) \
       && [[ -x "$maybe_file" ]]; then
        shebang=1
    else
        shebang=0
    fi

Lowering the probability to even a very low probability in not a nice option (I would understand people voting down for this) but mixing this heuristic with the bullet proof solution might be (or not) better than nothing until a more definitive solution is found.

I understand perfectly what's happening, the pros and the cons of each comment provided (thanks a lot to all of you!). I'm just a bit surprised that there is apparently no way to get the original command line after the shebang.

<mypgm> ./myscript is still a valid form, and would yield the same results as your "called from shebang" option. — mgarciaisaia, May 06 '19 at 17:24
I did not pretend that it was THE solution, but wrote "CLOSER to the solution". I know this is not yet perfect but it is already better than using the heuristic [[ -x <myscript> ]] and nonvoluntary_ctxt_switches in /proc/$$/status. I proposed this as a possible starting point to the solution, as I may still edit it ;-) — Jacques, May 06 '19 at 18:47
"there is apparently no way to get the original command line" On linux (your Q is tagged "linux") there very much is a well defined and documented way to retrieve the original command, as I explained in my answer. As to that being hard to do in pure shell + standard utilities, everything is hard to impossible to do in those conditions, including trivial tasks like (reliably) sorting a list of files by size. — , May 07 '19 at 12:42

shebang or not shebang

6 Answers6

Linked