How complex can a program be written in pure Bash?

Question

After some very quick research, it seems Bash is a Turing-complete language.

I wonder, why is Bash used almost exclusively to write relatively simple scripts? Since a Bash shell comes with Linux, you can run shell scripts without any external interpreter or compiler, as required for other popular computer languages. This is a huge advantage, that could compensate for the mediocrity of the language itself in some cases.

So, is there a limit to how complex such programs can get? Is pure Bash used to write complex programs? Is is possible to write, say, a file compressor/decompressor in pure Bash? A compiler? A simple video game?

Is it so sparsely used just because there are only very limited debugging tools?

The sh script configure which is used as part of the build process for a great many un*x packages is not 'relatively simple'. — user4556274, Jul 23 '16 at 12:52
@user4556274 It's not, but it's not usually written by hand either but from an extensive set of m4 macros. — Kusalananda, Jul 23 '16 at 13:04
There is an x86 assembler in Bash, so yes, Bash is occasionally used to write complex programs. Why doesn't people do that more often? Possibly because the interpreter is also slow, crappy, and it's prone to "interesting" bugs (see f.i. Shellshock. Also, Bash scripts tend to get exponentially harder to maintain with size. Look at the assembler above; can you tell from the source if it follows AT&T or Intel syntax? — Satō Katsura, Jul 23 '16 at 13:07
configure scripts are also slow, do a whole bunch of useless work and have been the subject of some amusing rants. Of course the shell can be used for large programs, but then again people also have made computers out of Conway's Game of Life and Minecraft, and there are also programming languages like Brainf**k and Hexagony. Apparently some people just like to build something out of really small and confusing atoms. You can even sell games with that idea... — ilkkachu, Jul 23 '16 at 13:26
So, is this question answerable or not? They put it on hold and say it's unanswerable, but yet I get a few great answers. It would be nice to be coherent, as I'm new to this SE, in order to direct me to what kind of questions are and aren't desirable on this SE. — Bregalad, Jul 23 '16 at 16:30
I've written numerous complex to extremely complex script in bash (as well as an enormous number of simple scripts). Usually because I started in bash and was too stubborn to rewrite it in perl or python. Or to prove that it's possible. and it IS possible to do very complex things in sh/bash, but it's much harder than in other, more capable, languages - and quite often, the complexity is entirely because you're writing in sh rather than in perl or python or whatever....things that I can do in 5-10 lines or so of perl might take 20 or 50 or 100 lines in bash. — cas, Jul 24 '16 at 11:27
Somewhere, I have a BASIC interpreter kicking around that someone wrote in sh — Edward Falk, Jul 25 '16 at 20:18
Despite the good answers you got, some very complete, the one thing I have never been able to find in any language is the radical stability of Bash over time in terms of its language syntax, as well as reducing to the absolute minimum the dependencies required. As someone who does a very large program in bash that was and remains my absolute top priority in this particular instance, and it made bash + gawk actually the only real choice. All the negatives listed in the comments are absolutely true, but if the requirement is for it to run on anything anywhere, there are not many other options. — Lizardx, Jun 09 '17 at 19:48
For me, the two worst things in Bash by the way, what I truly miss, are the inability of functions to simply return their data instead of as noted below, having to echo out the return in a forked process. And the lack of good multidimensional associative arrays. In terms of real world problem solving and code slow down, those are the biggest things that slow things down. While not as fast as well written compiled languages, I don't find the speed issue to be significant as long as I keep basic optimizations in mind, and avoid certain vary slow bash constructs. — Lizardx, Jun 09 '17 at 19:52
To quote the immortal perspicacity of Mike Fossey, "its stupid when girls say they cant find a guy, yet they ignore me. its like saying youre hungry when theres a hot dog on the ground outside" — iono, Jun 19 '23 at 04:00

Warren Young · Accepted Answer · 2019-04-25T13:29:51.527

it seems Bash is a Turing-complete language

The concept of Turing completeness is entirely separate from many other concepts useful in a language for programming in the large: usability, expressiveness, understandabilty, speed, etc.

If Turing-completeness were all we required, we wouldn't have any programming languages at all, not even assembly language. Computer programmers would all just write in machine code, since our CPUs are also Turing-complete.

why is Bash used almost exclusively to write relatively simple scripts?

Large, complex shell scripts — such as the configure scripts output by GNU Autoconf — are atypical for many reasons:

Until relatively recently, you couldn't count on having a POSIX-compatible shell everywhere.

Many systems, particularly older ones, do technically have a POSIX-compatible shell somewhere on the system, but it may not be in a predictable location like /bin/sh. If you're writing a shell script and it has to run on many different systems, how then do you write the shebang line? One option is to go ahead and use /bin/sh, but choose to restrict yourself to the pre-POSIX Bourne shell dialect in case it gets run on such a system.

Pre-POSIX Bourne shells don't even have built-in arithmetic; you have to call out to expr or bc to get that done.

Even with a POSIX shell, you're missing out on associative arrays and other features we've expected to find in Unix scripting languages since Perl first became popular in the early 1990s.

That fact of history means there is a decades-long tradition of ignoring many of the powerful features in modern Bourne family shell script interpreters purely because you can't count on having them everywhere.

This still continues to this day, in fact: Bash didn't get associative arrays until version 4, but you might be surprised how many systems still in use are based on Bash 3. Apple still ships Bash 3 with macOS in 2017 — apparently for licensing reasons — and Unix/Linux servers often run all but untouched in production for a very long time, so you might have a stable old system still running Bash 3, such as a CentOS 5 box. If you have such systems in your environment, you can't use associative arrays in shell scripts that have to run on them.

If your answer to that problem is that you only write shell scripts for "modern" systems, you then have to cope with the fact that the last common reference point for most Unix shells is the POSIX shell standard, which is largely unchanged since it was introduced in 1989. There are many different shells based on that standard, but they've all diverged to varying degrees from that standard. To take associative arrays again, bash, zsh, and ksh93 all have that feature, but there are multiple implementation incompatibilities. Your choice, then, is to only use Bash, or only use Zsh, or only use ksh93.

If your answer to that problem is, "so just install Bash 4," or ksh93, or whatever, then why not "just" install Perl or Python or Ruby instead? That is unacceptable in many cases; defaults matter.
None of the Bourne family shell scripting languages support modules.

The closest you can come to a module system in a shell script is the . command — a.k.a. source in more modern Bourne shell variants — which fails on multiple levels relative to a proper module system, the most basic of which is namespacing.

Regardless of programming language, human understanding starts to flag when any single file in a larger overall program exceeds a few thousand lines. The very reason we structure large programs into many files is so that we can abstract their contents to a sentence or two at most. File A is the command line parser, file B is the network I/O pump, file C is the shim between library Z and the rest of the program, etc. When your only method for assembling many files into a single program is textual inclusion, you put a limit on how large your programs can reasonably grow.

For comparison, it would be like if the C programming language had no linker, only #include statements. Such a C-lite dialect would not need keywords such as extern or static. Those features exist to allow modularity.
POSIX doesn't define a way to scope variables to a single shell script function, much less to a file.

This effectively makes all variables global, which again hurts modularity and composability.

There are solutions to this in post-POSIX shells — certainly in bash, ksh93 and zsh at least — but that just brings you back to point 1 above.

You can see the effect of this in style guides on GNU Autoconf macro writing, where they recommend that you prefix variable names with the name of the macro itself, leading to very long variable names purely in order to reduce the chance of collision to acceptably near zero.

Even C is better on this score, by a mile. Not only are most C programs written primarily with function-local variables, C also supports block scoping, allowing multiple blocks within a single function to reuse variable names without cross-contamination.
Shell programming languages have no standard library.

It is possible to argue that a shell scripting language's standard library is the contents of PATH, but that just says that to get anything of consequence done, a shell script has to call out to another whole program, probably one written in a more powerful language to begin with.

Neither is there a widely-used archive of shell utility libraries as with Perl's CPAN. Without a large available library of third-party utility code, a programmer must write more code by hand, so she is less productive.

Even ignoring the fact that most shell scripts rely on external programs typically written in C to get anything useful done, there's the overhead of all those pipe()→fork()→exec() call chains. That pattern is fairly efficient on Unix, compared to IPC and process launching on other OSes, but here it's effectively replacing what you'd do with a subroutine call in another scripting language, which is far more efficient still. That puts a serious cap on the upper limit of shell script execution speed.
Shell scripts have little built-in ability to increase their performance via parallel execution.

Bourne shells have &, wait and pipelines for this, but that's largely only useful for composing multiple programs, not for achieving CPU or I/O parallelism. You're not likely to be able to peg the cores or saturate a RAID array solely with shell scripting, and if you do, you could probably achieve much higher performance in other languages.

Pipelines in particular are weak ways to increase performance via parallel execution. It only lets two programs run in parallel, and one of the two will likely be blocked on I/O to or from the other at any given point in time.

There are latter-day ways around this, such as xargs -P and GNU parallel, but this just devolves to point 4 above.

With effectively no built-in ability to take full advantage of multi-processor systems, shell scripts are always going to be slower than a well-written program in a language that can use all the processors in the system. To take that GNU Autoconf configure script example again, doubling the number of cores in the system will do little to improve the speed at which it runs.
Shell scripting languages don't have pointers or references.

This prevents you from doing a bunch of things easily done in other programming languages.

For one thing, the inability to refer indirectly to another data structure in the program's memory means you're limited to the built-in data structures. Your shell may have associative arrays, but how are they implemented? There are several possibilities, each with different tradeoffs: red-black trees, AVL trees, and hash tables are the most common, but there are others. If you need a different set of tradeoffs, you're stuck, because without references, you don't have a way to hand-roll many types of advanced data structures. You're stuck with what you were given.

Or, it may be the case that you need a data structure that doesn't even have an adequate alternative built into your shell script interpreter, such as a directed acyclic graph, which you might need in order to model a dependency graph. I've been programming for decades, and the only way I can think of to do that in a shell script would be to abuse the file system, using symlinks as faux references. That's the sort of solution you get when you rely merely on Turing-completeness, which tells you nothing about whether the solution is elegant, fast, or easy to understand.

Advanced data structures are merely one use for pointers and references. There are piles of other applications for them, which simply can't be done easily in a Bourne family shell scripting language.

I could go on and on, but I think you're getting the point here. Simply put, there are many more powerful programming languages for Unix type systems.

This is a huge advantage, that could compensate for the mediocrity of the language itself in some cases.

Sure, and that's precisely why GNU Autoconf uses a purposely-restricted subset of the Bourne family of shell script languages for its configure script outputs: so that its configure scripts will run pretty much everywhere.

You will probably not find a larger group of believers in the utility of writing in a highly-portable Bourne shell dialect than the developers of GNU Autoconf, yet their own creation is written primarily in Perl, plus some m4, and only a little bit of shell script; only Autoconf's output is a pure Bourne shell script. If that doesn't beg the question of how useful the "Bourne everywhere" concept is, I don't know what will.

So, is there a limit to how complex such programs can get?

Technically speaking, no, as your Turing-completeness observation suggests.

But that is not the same thing as saying that arbitrarily-large shell scripts are pleasant to write, easy to debug, or fast to execute.

Is is possible to write, say, a file compressor/decompressor in pure bash?

"Pure" Bash, without any calls out to things in the PATH? The compressor is probably doable using echo and hex escape sequences, but it would be fairly painful to do. The decompressor may be impossible to write that way due to the inability to handle binary data in shell. You'd end up calling out to od and such to translate binary data to text format, shell's native way of handling data.

Once you start talking about using shell scripting the way it was intended, as glue to drive other programs in the PATH, the doors open up, because now you're limited only to what can be done in other programming languages, which is to say you don't have limits at all. A shell script that gets all of its power by calling out to other programs in the PATH doesn't run as fast as monolithic programs written in more powerful languages, but it does run.

And that's the point. If you need a program to run fast, or if it needs to be powerful in its own right rather than borrowing power from others, you don't write it in shell.

A simple video game?

Here's Tetris in shell. Other such games are available, if you go looking.

there are only very limited debugging tools

I would put debugging tool support down about 20th place on the list of features necessary to support programming in the large. A whole lot of programmers rely much more heavily on printf() debugging than proper debuggers, regardless of language.

In shell, you have echo and set -x, which together are sufficient to debug a great many problems.

"Shell scripts have little built-in ability to do parallel execution." In my opinion the shell has better support for parallel processing than most other languages. With a single character & you can run processes in parallel. You can wait for the child processes to complete. You can set up pipelines, and more complex networks of pipes using named pipes. Most importantly, it is simple to do parallel processing the right way, with very little boilerplate code and avoiding the risks and difficulties of shared-memory multi-threading. — Sam Watkins, Dec 17 '16 at 04:20
@SamWatkins: I've updated point 5 above to address your reply. While I, too, am a fan of message-passing between separate processes as a way to avoid many of the problems inherent in shared-memory parallelism, the point I was making here is about increasing performance, not about composability and such, and that often requires shared-memory parallelism. — Warren Young, Jun 09 '17 at 19:19
Shell scripts are good for prototyping - but eventually a project should move to a proper programming language, then ideally a compiled language. Then in extreme cases assembly, like you would see with the FFmpeg project. Cmake is a good example of what should happen to Autotools - its written in C and doesnt require Perl or Texinfo or M4. Its kind of embarassing really that Autotools still relies so heavily on shell scripts after 30 years http://wikipedia.org/wiki/GNU_Build_System#Criticism — Zombo, Feb 14 '18 at 00:07
Since the OP asked about how complex Bash scripts can get, I thought it would be useful to add this example of Bashtop (htop but written in Bash :D ) https://github.com/aristocratos/bashtop I was really surprised at its complexity, things that I had no idea could be accomplished in Bash. — Rakib Fiha, May 01 '20 at 06:47

Gilles 'SO- stop being evil' · Answer 2 · 2017-06-09T20:41:46.127

We can walk or swim anywhere, so why do we bother with bicycles, cars, trains, boats, planes and other vehicles? Sure, walking or swimming can be tiring, but there is a huge advantage in not needing any extra equipment.

For one thing, although bash is Turing-complete, it is not good at manipulating data other than integers (not too large), strings, (one-dimensional) arrays of strings, and finite maps from strings to strings. Any other kind of data needs a bothersome encoding, which makes it hard to write the program and would often impose performance that is not good enough in practice. For example, floating-point operations in bash are hard and slow.

Furthermore bash has very few ways of interacting with its environment. It can run processes, it can perform some simple kinds of file accesses (through redirection), and that's about it. Bash also has a client-side networking client. Bash can emit null bytes easily enough (printf \\0) but not parse null bytes in its input, which makes it ill-suited to read binary data. Bash can't directly do other things: it has to call external programs for that. And that's ok: shells are designed for the primary purpose of running external programs! Shells are the glue language to combine programs together. But if you're running an external program, that means that program has to be available — and then you reduce the portability advantage: you have to stick to the few programs that are available everywhere (mostly POSIX utilities).

Bash doesn't have any kind of feature that makes it easier to write robust programs, apart from set -e. It doesn't have (useful) types, namespaces, modules, or nested data structures. Bugs are the number one difficulty in programming; while the ease of writing bug-free programs is not always the deciding factor in picking a language, bash ranks poorly on that count. Bash also ranks poorly on performance when doing things other than combining programs together.

For a long time bash didn't run on Windows, and even today it isn't present in a default Windows installation, and it doesn't run fully natively (even in WSL) in the sense that it doesn't have interfaces to Windows's native features. Bash doesn't run on iOS and isn't installed by default on Android. So unless you're writing a Unix-only application, bash is not at all portable.

Requiring a compiler is not a problem for portability. The compiler runs on the developers' machine. Requiring an interpreter or third-party libraries can be a problem, but under Linux it's a solved problem through distribution packages, and under Windows, Android and iOS, people generally bundle third-party components in their application package. So the kind of portability concerns you have in mind are not practical concerns for run-of-the-mill applications.

My answer applies to shells other than bash. A few details vary from shell to shell but the general idea is the same.

I believe the myth of portability has been talked about fairly frequently, not sure I'd use that particular item as a negative since it applies to most other languages as well, including Java. Even PHP running on a windows server vs a *nix server has some small differences which you always have to be aware of, should you be foolish enough to run anything on a windows server, that is. Lots of things don't run on android or iOs so not sure how that could be a valid comment either. — Lizardx, Jun 09 '17 at 20:03

score 7 · Answer 3 · answered Jul 23 '16 at 13:06

Some reasons not to use shell scripts for large programs, just off the top of my head:

Most functions are done by forking off external commands, which is slow. In contrast, programming languages like Perl can do the equivalent of mkdir or grep internally.
There's no easy way to access C libraries, or make direct system calls, which means that e.g. the video game would be hard to create
Proper programming languages have better support for complex data structures. Though Bash does have arrays and associative arrays, but I wouldn't want to think of a linked list or a tree.
The shell is made to process commands that are made if text. Binary data (that is, variables containing NUL bytes (bytes with value zero)) are hard to impossible to handle. Depends a bit on the shell, zsh has some support. This is also because the interface for external programs is mostly text-based, and \0 is used as a separator.
Also because of external commands, the separation between code and data is slightly difficult. Witness all the trouble there is when quoting data to a another shell (i.e. when running bash -c ... or ssh -c ...)

This is the most accurate set of negatives to me, as someone who does many large bash scripts, these would roughly be what I would list as the negatives as well. However, one thing I've found is that Bash is actually not that much slower than other compiled languages when comparing similar functionality. I have a sneaking suspicion that were I to attempt to write some of the more complicated things I have in bash in python, the speed difference would not make the monstrous work involved ever worth it. However, Bash alone I found too limited, but Bash + gawk works well, gawk is almost real. — Lizardx, Jun 09 '17 at 20:07

How complex can a program be written in pure Bash?

3 Answers3