4

In a makefile, I have

@echo "$(IGNORE_DIRS) $(CLEAN_FILES) $(CLEAN_DIRS) $(REALCLEAN_FILES)" | tr ' ' '\n' >> $@

The problem is that $(CLEAN_FILES) is quite large, so when I run make, I get

make: execvp: /bin/sh: Argument list too long

I'm on Xubuntu 18.10.

Edit: I should provide a little more context. What I am working on is a make rule (I'm using GNU make) to automatically generate the .hgignore file. Here is the make rule in its entirety:

.hgignore : .hgignore_extra
    @echo "Making $@"
    @rm -f $@
    @echo "# Automatically generated by Make. Edit .hgignore_extra instead." > $@
    @tail -n +2 $< >> $@
    @echo "" >> $@
    @echo "# The following files come from the Makefile." >> $@
    @echo "syntax: glob" >> $@
    @echo "$(IGNORE_DIRS) $(CLEAN_FILES) $(CLEAN_DIRS) $(REALCLEAN_FILES)" | tr ' ' '\n' >> $@
    @chmod a-w $@
.PHONY : .hgignore

Edit 2: At @mosvy 's suggestion, I have also tried

.hgignore : .hgignore_extra
    @echo "Making $@"
    @rm -f $@
    @echo "# Automatically generated by Make. Edit .hgignore_extra instead." > $@
    @tail -n +2 $< >> $@
    @echo "" >> $@
    @echo "# The following files come from the Makefile." >> $@
    @echo "syntax: glob" >> $@
    $(file >$@) $(foreach V,$(IGNORE_DIRS) $(CLEAN_FILES) $(CLEAN_DIRS) $(REALCLEAN_FILES),$(file >>$@,$V))
    @true
    @chmod a-w $@
.PHONY : .hgignore

Running make .hgignore with this, I no longer get the "Argument list too long" error, but the generated .hgignore file only contains output up to the syntax: glob line, and then nothing after that.

teerav42
  • 159
  • What is the total string size you are talking about? Modern OS allow 1-2MB – schily Nov 09 '18 at 03:40
  • Drop the $(file >$@) part, that's only there to truncate the file if it already existed, but you already have contents there, so you don't want it... Is that line indented with space or with a TAB character? It needs to be a TAB for it to work... You also don't need the @true, since you have another rule following that one... – filbranden Nov 09 '18 at 21:26
  • I know that make requires tab indents. I removed $(file >$@) and @true, but it still doesn't work. – teerav42 Nov 09 '18 at 21:47
  • Taking a step back... I'd argue the idea of generating .hgignore is at least questionable... Particularly since it supports wildcards such as *.o, *.pyc, etc. – filbranden Nov 09 '18 at 23:41
  • Fair point. However I was not the original author of this makefile; I just have to help maintain it when I run into issues like this. Furthermore, it is for a rather complicated project (it builds an entire textbook). So I don't have the authority to change the make structure of the project, and even if I did it would be extremely impractical. – teerav42 Nov 09 '18 at 23:55
  • I believe the problem you're running into now (with @mosvy's solution) is that Make will first evaluate all the shell commands (thus running all the Make commands such as $(file)) before it runs the shell commands. Try using Make commands only, you can do most of that using $(file), and $(shell) will be useful too, e.g. $(shell chmod a-x $@) or $(file >>$@,$(shell tail -n +2 $<)) – filbranden Nov 10 '18 at 06:02

4 Answers4

7

As @schily has already explained, this is not a shell problem, and cannot be worked around with xargs, quoting, splitting into more echo's with ;, etc. All the text from a make action is passed as argument/s to a single execve(2), and it can't be longer than the maximum size allowed by the operating system.

If you're using GNU make (the default on linux), you can use its file and foreach functions:

TEST = $(shell yes foobar | sed 200000q)

/tmp/junk:
        $(file >$@) $(foreach V,$(TEST),$(file >>$@,$V))
        @true

.PHONY: /tmp/junk

This will print all words from $(TEST) separated by newlines into the file named in $@. It's based on a similar example from make's manual.

Your Makefile could probably be reworked into something more manageable, that doesn't require fancy GNU features, but it's hard to tell how from the snippets you posted.

Update:

For the exact snippet from the question, something like this could do:

.hgignore : .hgignore_extra
    $(info Making $@)
    $(file >$@.new)
    $(file >>$@.new,# Automatically generated by Make. Edit .hgignore_extra instead.)
    $(shell tail -n 2 $< >>$@.new)
    $(file >>$@.new,)
    $(file >>$@.new,# The following files come from the Makefile.)
    $(file >>$@.new,syntax: glob)
    $(foreach L, $(IGNORE_DIRS) $(CLEAN_FILES) $(CLEAN_DIRS) $(REALCLEAN_FILES), $(file >>$@.new,$L))
    @mv -f $@.new $@
    @chmod a-w $@
.PHONY : .hgignore

I've changed it a little, so it first writes into .hgignore.new, and if everything goes well, only then move .hgignore.new to .hgignore. You'll have to change back the indenting spaces to tabs, because this dumb interface is mangling whitespaces.

  • This seems like it should work for me, but no luck yet. I edited my question to show what my entire make rule looks like. I replaced my line @echo "$(IGNORE_DIRS) $(CLEAN_FILES) $(CLEAN_DIRS) $(REALCLEAN_FILES)" | tr ' ' '\n' >> $@ with your two lines, but when I run make .hgignore, the generated .hgignore file only goes up to the line that says syntax: glob, and nothing after that. – teerav42 Nov 09 '18 at 20:30
  • makes macros are expanded before (ie active commands in them are executed before) the action is passed to shell. That means that your rm $@ will wipe whatever $(file >>$@,...) has written. I'll fix your snippet when I get a device I can type on (not a phone) –  Nov 09 '18 at 23:46
  • So you are saying that rm -f $@ actually runs after $(file >>$@,...) ? That would explain the issue. I'm not sure I fully understand why it would go in this order though, could you possibly clarify why? Thanks so much! – teerav42 Nov 09 '18 at 23:58
  • 1
    since macros like $(foo) $(file >outfile,stuff) are expanded before the action is passed to the shell, their side-effects (eg. write stuff to outfile) also happen before the shell commands. –  Nov 10 '18 at 07:09
4

On UNIX the following rule applies:

The following data forms the initial stack of a process:

  • The sum of strlen() of all environment strings + final nul character per string
  • The sum of strlen() of all argument strings + final nul character per string
  • The environment array: n+1 environment strings * sizeof char *
  • The argv array: n+1 argument strings * sizeof char *
  • A few additional numbers

All this data must not exceed ARG_MAX.

On a historical UNIX, the value for ARG_MAX was 10240 or 20480 bytes.

SunOS-4.0 (published in December 1987) raised that limit to 1MB

Solaris-7.0 (published in 1997) introduced 64 Bit support and in order to avoid a practically smaller limit on 64 Bit systems (caused by bigger env and argvarrays as a result from a larger char *), ARG_MAX was raised to 2MB for 64 bit programs.

BTW: Modern POSIX compliant OS include support for the getconf program and getconf ARG_MAX prints the actual value. On a 64 Bit Linux, this returns 2MB so Linux on the first view seems to adopt the SunOS enhancements....

Now let us look at make:

The make program calls commands from Makefiles via:

sh -ce command

where command is a single argument that is the expanded string you see in an action line from a Makefile.

SunPro Make introduced an optimization in the early 1990s:

  • If a command line does not contain shell meta characters, make itself tokenizes the command line and calls the command via: execv() in order to avoid the overhead from a shell call.

Later, gmake and smake adopted this optimization.

smake introduced another optimization in 2012:

  • If a shell command is introduced by a simple echo command that ends in a ; and if the following command line does not contain shell meta characters, the echo is inlined to smake and the command after the ; is executed via execv() in order to reduce the overhead for modern build systems that typically use @ to suppress make command echos and rather use simplified echo calls to make the make output easier to understand (see e.g. the Schily Makefile system introduced in February 1993).

None of these make rules apply to your makecommand line since your command line contains shell meta characters. So your whole command is called via:

sh -ce command

where command is a single string with the total size caused by your makefile.

Now it seems that the Linux kernel is neither UNIX nor POSIX compliant and enforces an additional limitation that never existed on UNIX. This additional limitation seems to be based on the maximum length of a single string.

If this is really true, this would disqualify Linux as it would prevent Linux from being useful with larger projects managed by make.

Did you think about making a bug report to the Linux kernel folks?

schily
  • 19,173
  • Thank you for the detailed information! While a bug report for the Linux folks might be well placed, I don't think I understand all of this well enough to be able to write a useful and informative bug report myself. – teerav42 Nov 09 '18 at 20:38
  • The 32 page limit (generally 128KiB) of a single argument/env-string on Linux is clearly documented in the execve() man page. The ARG_MAX is now based on the stack size limit. In any case POSIX allows ARG_MAX to be as low as 4096, so a 128KiB limit (the max size of a single argument but also the lowest ARG_MAX allowed by Linux) is a lot more useful than what POSIX allows. – Stéphane Chazelas Nov 11 '18 at 23:30
  • The problem on Linux is that sysconf(_SC_ARG_MAX) works and returns 2 MB for a 64 bit program. In other words, Linux has a limitation that is useless (because it is unneeded) and that is not mentioned in the POSIX standard. – schily Nov 11 '18 at 23:47
  • I agree that's an annoying and unfortunate limit, but I'm arguing that at least the Linux API gives you a guarantee of at least 128K while the POSIX API only of 4K. In Linux, you can get a POSIX compliant behaviour by lowering the default stack size limit from 8M to 512K (ARG_MAX is 1/4 stack size limit or 128K whichever's highest), though that would be a bit silly. I can't see how that would disqualify Linux. Except maybe GNU hurd, the size of a single argument is limited (if only by ARG_MAX) on all systems. On FreeBSD, AFAICT, ARG_MAX is 256K and cannot be changed at runtime – Stéphane Chazelas Nov 12 '18 at 10:58
  • 1
    First: we are no longer in the 1980s, so there is no reson to introduce such an (today) artificial limit. Linux should better remove it. The problem with that limit is that it may prevent you from using make with larger projects. – schily Nov 12 '18 at 11:11
  • @schily that would be an absolutely terrible idea. It's a POSIX thing. It would also mean the C standard would have to change. As well as many programs that rely on it, many of which are useful but no longer maintained. In your attempt to 'fix' things it would cause a mass breakage. – Pryftan Apr 03 '23 at 19:15
1

The ur-question in such cases is why you are doing this in the first place? It looks like you are trying to create some sort of .ignore file containing all the files which are presumably either program output or globs. In the former case I would advocate passing the program output directly to tr (if strictly necessary) and then to the file without passing through an intermediary variable. If you're using globs then you can use a very simple for loop:

for file in *
do
    echo "$file" >> out.txt
done

(Escape if necessary in your Makefile.)

In very general terms:

  • Use pipes and redirects rather than variables for any large chunks of data. They are really fast (since they can be handled as fast as every program in the pipeline can process input).
  • Use xargs when you really have to.
  • Avoid useless echos and cats.
l0b0
  • 51,350
  • The problem doesn't seem to be with too many arguments, but with a single one that is too long... – filbranden Nov 09 '18 at 03:31
  • Your proposal does not apply, since the problem is the interaction between make and the called commands. The problem does not exist from the shell view and your proposal only works at shell level. – schily Nov 09 '18 at 10:18
0

See this answer, in particular this quote:

one argument must not be longer than MAX_ARG_STRLEN (131072). This might become relevant if you generate a long call like "sh -c 'generated with long arguments'".

On the other hand, the limit on total number of arguments to pass is quite high, so maybe instead just pass them as separate arguments, since the end result of the echo will be the same?

This just removes the "s from your original command:

@echo $(IGNORE_DIRS) $(CLEAN_FILES) $(CLEAN_DIRS) $(REALCLEAN_FILES) | tr ' ' '\n' >> $@

Hopefully this is enough to keep the arguments within the limits for number of arguments (which is quite high) and length of each argument (which will no longer be an issue, since each argument is now short.)


UPDATE: Removing the quotes won't really fix this completely, since make runs each command in a shell, using the equivalent of sh -c '...' where the whole command becomes a single argument, so it will still be bound to the limit length for an individual argument, which is 131,072 bytes on Linux on x86 platform.

filbranden
  • 21,751
  • 4
  • 63
  • 86
  • If you believe this is the limit, you seem to work on a OS from the early 1980s. SunOS changed this 30 years ago to 1MB for 32 bit programs and 21 years ago with introducing 64 bits, the 64 bit limit is 2 MB. BTW, the size of a single string is not important, it is the total size. – schily Nov 09 '18 at 03:37
  • @schily I guess that's dependent on the OS... Linux still defines this (as of 4.19.1) as 32 pages, which on x86 and x86-64 still means 131,072 bytes. The OP says Xubuntu 18.10, which means Linux, so that still applies. Regardless of the exact limit, the issue seems to be with the length of the argument, breaking it into multiple arguments should help work around the issue (especially considering it's easy to do.) – filbranden Nov 09 '18 at 03:41
  • As mentioned: only the total size is important unless your OS is written in a silly way – schily Nov 09 '18 at 03:43
  • @schily I can confirm that at least on Linux the length limit is per argument, see copy_strings. The OP specifically mentioned Linux, so I imagine they're interested in how this works on Linux. – filbranden Nov 09 '18 at 03:46
  • If you are correct, the recommendation would be to switch to a modern OS... – schily Nov 09 '18 at 03:47
  • Removing the "s unfortunately did not solve the issue. I did remove the @ from the line in the makefile; this way make prints the values of the variables to the terminal. I copied and pasted this output to a text file. It is 172,430 bytes. Using grep and wc to count the number of spaces and double spaces in the file, I determined that there are roughly 5500 different files names there. – teerav42 Nov 09 '18 at 04:16
  • @teerav42 Check what getconf ARG_MAX returns, that's the maximum number of bytes for all the arguments together. It's calculated based on the limit for the stack size (ulimit -s), so if you customized that, it might be causing the issue you're seeing. – filbranden Nov 09 '18 at 05:52
  • @teerav42 You did the right test to get the total length of the command. See my in depth explanation in my own answer... – schily Nov 09 '18 at 10:06