4

Can memory be preallocated in bash for arrays of a defined size? I am iteratively adding strings to an array (declared by declare -a arr) in a loop (arr+=("$str")), and am wondering whether performance would be improved by preallocating memory of the appropriate size (e.g., 10,000 elements containing strings not larger than 512 characters).

user001
  • 3,698
  • 3
    If you want performance, use another shell than bash. Or even better, don't even use a shell. If you need arrays, chances are it's more a proper programming language than a shell that you need/ – Stéphane Chazelas Sep 18 '17 at 20:20
  • Bash simply lacks a lot of important data- and control structures, if you want to do big things with it, you will fail (or you will waste a very big effort for a half-result). To play with 10000 strings in an array, in a linux scripting environment, in your place I would use probably go (particularly if you have the intent to continously train also your programming skills). Others would say perl, python, a different shell, node.js or c. – peterh Sep 18 '17 at 22:52

1 Answers1

5

No, that's not possible in bash.

To speed up your script, try rethinking your program flow and logic. It is very seldom necessary to read in huge amounts of data in a variable or array.

Most Unix tools are filters that allows you to send data from one stage of a pipeline to the next without storing very much of the initial or intermediate data in memory (often just a line from a file at a time). It's uncommon to read a dataset into a variable and then manipulate it in the shell. It's more common to run transformations on data, while possibly aggregating parts of it.

If you find yourself handling shell variables with more than a few words in them, then chances are there's a more efficient way of doing it.

Kusalananda
  • 333,661
  • Thanks for your helpful reply. In this case, I am chunking commands (each string is a command) and then writing the strings to a script once a limit on some counter has been reached. The program flow is: | <while loop that builds a command for each element of stdin, storing the commands in an array, with a conditional that allows the contents of the array to be dumped to a file which is then launched when the array size reaches a predetermined limit> – user001 Sep 18 '17 at 20:22
  • 3
    @user001 Sounds like you're reinventing xargs. – Kusalananda Sep 18 '17 at 20:23
  • Hmm, I hadn't thought of using xargs for a shell script (I've only ever used it for standard system binaries). (The content of my while loop is essentially a short shell script.) I'll give it a try, thanks. – user001 Sep 18 '17 at 20:29
  • 2
    Thanks for your advice on xargs. I was able to achieve the desired behavior simply by using xargs -L <N> ./script.sh, where script.sh merely contains a short preamble and then printf "%s\n" "$@". – user001 Sep 18 '17 at 20:50
  • Actually, there is one problem, namely that I normally use another counter to give the chunks separate names (e.g., commands01.sh, commands02.sh) so that they don't overwrite one another. Is there some way that a counter can be accessed with xargs? The flow is: | xargs -L ./script.sh , in which script.sh takes the commands, writes a script file, and notifiers a launcher that the script is ready to be run. Is there a simple way by which a counter (from xargs) could be passed to script.sh to give the files separate names? (I could also use date +%s..) – user001 Sep 18 '17 at 21:33
  • Perhaps the only way to accomplish this would be to have the script called by xargs determine the count by reading it from a file and updating the value in the file at each execution. – user001 Sep 18 '17 at 21:56
  • @user001 Your script will be called with N arguments (if using xargs -L N. This number should be available in $#. You may iterate over "$@" in the script to loop over the command line arguments. – Kusalananda Sep 19 '17 at 13:13
  • Thanks. I meant whether the number of times xargs has processed N arguments (e.g., if there are 10*N arguments, can the the first N be distinguished from the second N and so forth by some environment variable?). I don't believe this is the case, and that use of the file system (to the store the counter) may be necessary. – user001 Sep 19 '17 at 20:13