1

Say I have an executable script process_image that performs actions on a base 64 encoded image. I am storing every image in a file images_file line by line. Every line of images_file is a base 64 encoded image. Some of the lines are very long therefore the following returns xargs: argument line too long:

cat images_file | xargs -L1 process_image

I wanted to modify process_image to take the entire stdout from cat images_file and then loop over each line using a simple while loop, but my colleagues have advised against this approach. Does xargs -L1 also internally use the same mechanism as while? How would using xargs be more desirable than using a while? What is the maximum argument length that xargs can handle and is there any way to overcome this while maintaining the cat <file> | xargs -L1 <executable_script> approach?

sriganesh
  • 101
  • 1
  • 7
  • Can the process_image script or program read the image data from stdin, or can you give it a command-line argument that specifies a file to read the image from (for example -f /path/to/image-file ? – Sotto Voce Jul 14 '22 at 07:27
  • The images_file is being generated in runtime by a previous process. I could find many workarounds to this, but I am not being allowed to. I want to know if I can specifically use xargs -L1 to allow an argument whose length exceeds the default allowed limit. – sriganesh Jul 14 '22 at 07:32
  • 1
    @sriganesh Why not run a simple shell loop to circumvent this problem, instead of using xargs? Try dumping the value of ARG_MAX, using getconf ARG_MAX – Inian Jul 14 '22 at 07:37
  • 1
    Linux has a hard-coded length limit for a single command line argument, and you might be crashing into that. See What defines the maximum size for a command single argument? The solution in that case, such as it is, would be to switch another Unix without that limit. Or to pass data like that using files or pipes instead of command line arguments... – ilkkachu Jul 14 '22 at 07:38
  • 1
    I asked about process_image rather than images_file, but if you're set on xargs, okay. I don't see anything in the xargs man page that suggests -L1 would help you, but I do see -s somelargenumber might help. (although the man page says the default value is ARG_MAX - 4096, so there doesn't seem to be a lot of space to gain) – Sotto Voce Jul 14 '22 at 07:41
  • 1
    @sriganesh have your colleague given a reason for advising against having process_image loop over input? – muru Jul 14 '22 at 07:53
  • Do you use the "Trailing blanks cause an input line to be logically continued on the next input line" part of -L? If not, a simple loop reading one line at a time might suffice – Chris Davies Jul 14 '22 at 08:25
  • @SottoVoce I am not allowed to create intermediate files for storing the generated images and using the process_image script to read these files from a location. I want to know whether xargs -L1 anyway uses the same mechanism as while. My boss insists that I do not use for or while loops. So if xargs also uses loops, why not use use while loop? Either that or I want to be able to use xargs to allow a size larger than ARG_MAX. I am a fresher with no prior experience in shell scripting and I am working under many heavy design constraints (no loops, no intermediate files). – sriganesh Jul 14 '22 at 08:37
  • @sriganesh, what OS are you on? What size are the command line arguments you need to pass? (the images, the lines in your file) – ilkkachu Jul 14 '22 at 08:38
  • 1
    There are many people who have this idea that while loops are somehow bad, but they are wrong. This is classic cargo cult programming: shell loops are a bad tool to process text files, so then people think they're a bad tool in general. There is nothing wrong with using a shell loop if it is the right tool for the job. – terdon Jul 14 '22 at 08:40
  • Shell loops are just fine if what you're doing is calling other programs with the data you loop over. But, if the person forbidding them is your boss, the technical arguments might not matter and you get to do what they want anyway. (The question says "colleagues", the comment says "boss", those are slightly different.) – ilkkachu Jul 14 '22 at 08:42
  • @ilkkachu Ubuntu Focal64. They are images between 10-50 kB being encoded in base 64. – sriganesh Jul 14 '22 at 08:43
  • 1
    @sriganesh, hmm, curious. As far as I understand, the Linux limit is 128 kB, which should be enough for a 50 kB image, even if that was before the base64 encoding. Can you check what happens if you take the largest image you can find, put that alone in a file, and then run xargs over that? Check to see if it runs, and what the exact length of the file was. – ilkkachu Jul 14 '22 at 08:46
  • 1
    Presumably this is the same boss/colleague who doesn’t believe that & truly runs processes in parallel – Stephen Kitt Jul 14 '22 at 10:05
  • Why does your process_image program take its image as a command-line argument? That would be reasonable for very small amounts of data, but is absurd when you're talking about many kilobytes or more of data, and likely to run into ARG_MAX command-line length limits (especially on systems that have smaller ARG_MAX than Linux does). Instead, your program should take its data from a file or files, with the argument being the filename(s) containing the image(s), or from stdin. That is what you need to fix. – cas Jul 14 '22 at 11:27
  • @sriganesh I wasn't going to suggest that you write the image data to files on disk, but rather use techniques like echo -n "$imagedata" | process_image or process_image <<<"$imagedata" or perhaps even process_data -f <( echo -n "$imagedata" ) to give the data to the command without creating oversize command-line arguments (which my first and third examples still do). But they depend on process_image accepting file data on stdin or using an argument for passing a file path on the command line, which is still not known in this discussion. – Sotto Voce Jul 14 '22 at 13:56
  • 1
    "takes stdin ... using xargs" makes no sense. xargs' job is to convert its input (stdin or a file) and put it onto the command line as arguments for another program. That program processes command-line arguments, it does not process stdin. As I said earlier, that is the thing you need to fix. It is the source of your problems, and they will not go away until you rewrite your process_image program so that it reads its data from stdin (or from a file) - taking bulk data from command line args is beyond absurd, it is insane. – cas Jul 14 '22 at 16:16

0 Answers0