5

Can using bash's globstar (**) operator cause an out of memory error? Consider something like:

for f in /**/*; do printf '%s\n' "$f"; done

When ** is being used to generate an enormous list of files, assuming the list is too large to fit in memory, will bash crash or does it have a mechanism to handle this?

I know I've run ** on humongous numbers of files and haven't noticed a problem, so I am assuming that bash will use something like temporary files to store some of the list as it is being generated. Is that correct? Can bash's ** handle an arbitrary number of files or will it fail if the file list exceeds what can fit in memory? If it won't fail, what mechanism does it use for this? Something similar to the temp files generated by sort?

chicks
  • 1,112
terdon
  • 242,166
  • 1
    There is a tag for "globstar" ?? :) – AdminBee Mar 15 '21 at 12:41
  • I just created it, @AdminBee. It seemed useful since there are various **-specific questions that can be asked. Do you think it isn't helpful? – terdon Mar 15 '21 at 12:45
  • Perhaps it is. After all, searching for ** using the site's search function(as "**" or \*\*) doesn't produce any results ... and I guess if someone were searching for it, they would know the option is called "globstar" since they would need to enable it in the first place. – AdminBee Mar 15 '21 at 12:48
  • 3
    Possibly related: https://unix.stackexchange.com/a/171347/237982 – jesse_b Mar 15 '21 at 12:55
  • 2
    globstar is the (very weirdly named) option David Korn picked for enabling the recursive-globbing feature it copied from zsh over 10 years later, and bash eventually copied as well another decade later. Several shells have added zsh-style recursive-globbing support, not all with that misnamed globstar option. Can we make the tag recursive-glob instead (and maybe a globstar alias to it for the ksh93/bash/tcsh users?). See also The result of ls * , ls ** and ls *** – Stéphane Chazelas Mar 15 '21 at 14:06
  • I interpreted your original question to mean, is ** implemented as an iterator which is evaluated incrementally as the for loop progresses, or does it generate all filenames first before the for loop starts its evaluation? – jrw32982 Mar 17 '21 at 19:40
  • 1
    @jrw32982 no, that isn't what I mean. I know it generates all file names first, my question was whether it also had a mechanism (such as writing partial file lists to temp files) to avoid running out of memory. – terdon Mar 17 '21 at 19:42

1 Answers1

10

Yes, it can, and this is explicitly accounted for in the globbing library:

  /* Have we run out of memory?  */
  if (lose)
    {
      tmplink = 0;
  /* Here free the strings we have got.  */
  while (lastlink)
    {
      /* Since we build the list in reverse order, the first N entries
         will be allocated with malloc, if firstmalloc is set, from
         lastlink to firstmalloc. */
      if (firstmalloc)
        {
          if (lastlink == firstmalloc)
            firstmalloc = 0;
          tmplink = lastlink;
        }
      else
        tmplink = 0;
      free (lastlink->name);
      lastlink = lastlink->next;
      FREE (tmplink);
    }

  /* Don't call QUIT; here; let higher layers deal with it. */

  return ((char **)NULL);
}

Every memory allocation attempt is checked for failure, and sets lose to 1 if it fails. If the shell runs out of memory, it ends up exiting (see QUIT). There’s no special handling, e.g. overflowing to disk or handling the files that have already been found.

The memory requirements in themselves are small: only directory names are preserved, in a globval structure which forms a linked list, storing only a pointer to the next entry and a pointer to the string.

ilkkachu
  • 138,973
Stephen Kitt
  • 434,908
  • 1
    Ah, OK. This was prompted by the comment thread under this SO answer where the OP was using for f in /home/**/* and the answer suggested splitting that into for d in /home/*; do for f in $d/**; do... under the assumption that building separate lists will get around any issues with bash failing because it tries to build one single list. If I understand correctly, you are saying that this is a valid assumption and we can have a case where ** will fail but multiple, smaller ** will not. – terdon Mar 15 '21 at 13:36
  • 2
    Yes, that would be one way of reducing the memory requirement. – Stephen Kitt Mar 15 '21 at 13:53
  • 1
    In any case, it's not specific to recursive globbing. Any glob can exhaust memory if they generate enough files. See the /*/*/*/*/../../../../*/*/*/*/../../../../*/*/*/* given as an example at Security implications of forgetting to quote a variable in bash/POSIX shells for instance. – Stéphane Chazelas Mar 15 '21 at 14:09
  • @StéphaneChazelas yes, my question was prompted by a (mistaken) idea I had that bash's ** behaved differently and would use temp files to store intermediate lists to avoid running out of memory. – terdon Mar 15 '21 at 14:41
  • 1
    In fact, malloc tends not to fail on Linux systems. Your shell is likely to get killed by OOM killer should it run out of memory. – val - disappointed in SE Mar 15 '21 at 22:22
  • @val not in this case, because the memory allocated by malloc is used immediately, so page allocation happens right after address space allocation; if there is no more memory malloc will fail before the OOM killer gets a chance to step in. Try it, you’ll see the shell exiting with no help from the OOM killer. – Stephen Kitt Mar 16 '21 at 05:33
  • The allocations are small, so from the arenas on the heap, and in most case won’t even result in a new page allocation; in such scenarios, overcommit is much less of a problem than with large allocations done with mmap or large brk changes. – Stephen Kitt Mar 16 '21 at 06:06