If I have a file with
#!/usr/bin/env foobar
what is the fastest/best way to determine if this file has a hashbang? I hear you can just read the first 2 bytes? How?
If I have a file with
#!/usr/bin/env foobar
what is the fastest/best way to determine if this file has a hashbang? I hear you can just read the first 2 bytes? How?
With zsh
:
if LC_ALL=C read -u0 -k2 shebang < file && [ "$shebang" = '#!' ]; then
echo has shebang
fi
Same with ksh93
or bash
:
if IFS= LC_ALL=C read -rN2 shebang < file && [ "$shebang" = '#!' ]; then
echo has shebang
fi
though bash
would give false positives for files that start with NULs followed by #!
and would read all the leading NUL bytes so would read a one tebibyte file created with truncate -s1T file
fully 2 bytes at a time for instance.
So with bash
, it would be better to use:
IFS= LC_ALL=C read -rn2 -d '' shebang
That is read up to 2 bytes of a NUL-delimited record.
Those don't fork processes nor execute extra commands as the read
, [
and echo
commands are all built-in.
POSIXly, you can do:
if IFS= read -r line < file; then
case $line in
("#!"*) echo has shebang
esac
fi
It is stricter in that it also requires a full line. On Linux at least, the newline is not required for a valid shebang though.
So you could do:
line=
IFS= read -r line < file
case $line in
("#!"*) echo has shebang
esac
It's slightly less efficient in that it would potentially read more bytes, with some shells one byte at a time. With our 1TiB sparse file, that would take a lot of time in most shells (and potentially use a lot of memory).
With shells other than zsh
, it could also give false positives for files that start with NULs followed by #!
.
With the yash
shell, it would fail if the shebang contains sequences of bytes that don't form valid characters in the current locale (would even fail (at least with 2.39 and older) if the shebang contained non-ASCII characters in the C locale, even though the C locale is meant to be the one where all characters are single bytes and all the byte values form valid --even if not necessarily defined-- characters)
If you want to find all the files whose content starts with #!
, you could do:
PERLIO=raw find . -type f -size +4c -exec perl -T -ne '
BEGIN{$/=\2} print "$ARGV\n" if $_ eq "#!"; close ARGV' {} +
We're only considering files that are at least 5 bytes large (#!/x\n
the minimum realistic shebang).
-exec perl... {} +
, we pass as many file paths to perl
as possible so run as few invocations as possible-T
is to work around that limitation of perl -n
and also means it won't work for files whose name ends in ASCII spacing characters or |
.PERLIO=raw
causes perl
to use read()
system calls directly without any IO buffering layer (affects the printing of file names as well) so it will do reads of size 2.$/ = \2
when the record separator is set as a reference to a number, it causes records to be fixed length ones.close ARGV
skips the rest of the current file after we've read the first record.You can define your own "magic patterns" in /etc/magic
and use file
to test:
$ sudo vi /etc/magic
$ cat /etc/magic
# Magic local data for file(1) command.
# Insert here your local magic data. Format is described in magic(5).
0 byte 0x2123 shebang is present
$ cat /tmp/hole2.sh #To prove [1] order of hex [2] 2nd line ignored
!#/bin/bash
#!/bin/bash
$ cat /tmp/hole.sh
#!/bin/bash
$ file /tmp/hole2.sh
/tmp/hole2.sh: ASCII text
$ file /tmp/hole.sh
/tmp/hole.sh: shebang is present
$ file -b /tmp/hole.sh #omit filename
shebang is present
0x2123
is hex of '#!' in reverse order:
$ ascii '#' | head -n1
ASCII 2/3 is decimal 035, hex 23, octal 043, bits 00100011: prints as `#'
$ ascii '!' | head -n1
ASCII 2/1 is decimal 033, hex 21, octal 041, bits 00100001: prints as `!'
Optionally you can put:
0 string \#\! shebang is present
ref: man 5 magic
, man 1 file
, man 1posix file
That should do it:
if [ "`head -c 2 infile`" = "#!" ]; then
echo "Hashbang present"
else
echo "no Hashbang present"
fi
head
don't got no -c
flag. tricky, tricky portability...
– thrig
Nov 25 '17 at 02:00
Fast may or may not be best, depending on your feelings on compiling a bunch of C (or maybe some assembly to get all that overhead of C out of the way. and all that tedious error checking, sheesh...)
#include <sys/types.h>
#include <err.h>
#include <fcntl.h>
#include <getopt.h>
#include <stdio.h>
#include <stdlib.h>
#include <sysexits.h>
#include <unistd.h>
int Flag_Quiet; /* -q */
void emit_help(void);
int main(int argc, char *argv[])
{
int ch;
char two[2];
ssize_t amount;
while ((ch = getopt(argc, argv, "h?q")) != -1) {
switch (ch) {
case 'q':
Flag_Quiet = 1;
break;
case 'h':
case '?':
default:
emit_help();
/* NOTREACHED */
}
}
argc -= optind;
argv += optind;
if (argc < 1)
emit_help();
if ((ch = open(*argv, O_RDONLY)) == -1)
err(EX_IOERR, "could not open '%s'", *argv);
amount = read(ch, two, 2);
if (amount == -1) {
err(EX_IOERR, "read failed on '%s'", *argv);
} else if (amount == 0) {
err(EX_IOERR, "EOF on read of '%s'", *argv);
} else if (amount == 2) {
if (two[0] == '#' && two[1] == '!') {
amount = 0;
} else {
amount = 1;
}
} else {
errx(EX_IOERR, "could not read two bytes from '%s'", *argv);
}
if (!Flag_Quiet) {
printf("%s\n", amount ? "no" : "yes");
}
exit(amount);
}
void emit_help(void)
{
fprintf(stderr, "Usage: hazshebang [-q] file\n");
exit(EX_USAGE);
}
This will require some tweaks if you want a "no" on standard out alongside one of the (many!) err
exits from the above. Probably better to check the exit status word.
The slower shell way with head -c 2 file
fails a quick portability test to OpenBSD.
$ head -c 2 /etc/passwd
head: unknown option -- c
usage: head [-count | -n count] [file ...]
$
find . -name "*.yml" -exec shebang -q {} \; -exec chmod 0755 {} \;
worked
– johnnyB
Jun 17 '19 at 21:56
use grep
in a one-liner solution
if head -1 file | grep "^#\!" > /dev/null;then echo "true"; fi
Using pwsh
I wanted a portable solution, which wouldn't buffer the entire file (what if the file has no newlines?).
$bytes = Get-Content $path -AsByteStream -TotalCount 2
$isShebang = '#!' -eq -join [char[]]$bytes
Gets the first two bytes, casts to char
, joins them to a string to check for equality.