3

I want to forward the job output from a job run by a third party scheduler. The scheduler allows me to insert the output into a command, e.g. like so:

python script.py --job-output '{joboutput}'

Where {joboutput} becomes the raw job output forwarded by the scheduler.

The issue that I have encountered is that the job output can contain can contain unbalanced single and double quotes (as well as special characters like (, | etc), so either wrapping the joboutput in single or double quotes doesn't work.

I have looked through many similar questions, but have found none that cover this exact scenario. I would highly appreciate any suggestions. Many thanks!

dreamer
  • 83
  • What is the type of the --job-output argument for the python program, string? How do you run it now, assuming there are no special characters, it's not clear to me from your pseudo-code example. – thanasisp Apr 27 '22 at 22:34
  • @thanasisp It's string type. A working example would be python script.py --job-output 'random job output' – dreamer Apr 28 '22 at 08:08
  • @thanasisp script.py forwards the output on to a chat application. The scheduler is set to only trigger script.py if a job completed with an error. So it's essentially just a way to monitor job failures and display the error message. – dreamer Apr 28 '22 at 10:05
  • This question is better on Stackoverflow as it's concerns programming and not Unix/Linux. – Nasir Riley Apr 30 '22 at 20:17
  • It's unclear where the error occurs. Is it when you call your Python script or is it when your Python script (which you do not show) does something with the passed string? – Kusalananda Apr 30 '22 at 20:29
  • @dreamer, does that scheduler just insert some text on a command line and then launches a shell to interpret and execute that? Because if so, dealing with arbitrary characters is very hard. The easiest would be if you knew the output didn't contain unescaped single quotes (or that any single quotes would be escaped); then you could wrap the whole thing in single quotes (like you have in the example). But you can't really insert just any string on the shell command line and have it treated as-is. The shell needs to know where the string ends, and that requires some syntax. – ilkkachu Apr 30 '22 at 21:33
  • @NasirRiley I don't think so. This is very much about how the shell can (or cannot) parse strings containing unbalanced quotes of any sort – Chris Davies May 01 '22 at 18:32

1 Answers1

4

If that command line is preprocessed by the scheduler and it then sends it to a shell for execution (e.g. though sh -c), after doing a simple text replacement of {joboutput} with the actual text, then what you're asking can't really be done. Not on just one line anyway.

It is possible to pass an arbitrary (NUL-terminated) string as a command line argument (up to some maximum length anyway), but inserting the string on the shell command line requires following the shell's syntax/quoting rules.

Basically, the shell has double quotes, single quotes and backslash escapes. Within double quotes, you need to escape a number of characters by prefixing them with backslashes, so you can't put an arbitrary string inside those. Within single quotes, you don't need to escape anything else, but the single quotes themselves need special processing. The usual way is to replace the quotes with '\'', which just closes the quoted string, inserts an escaped single quote, and reopens the quoted string. In any case, there's some characters that need to be treated specially, and no way around it. It has to be that way, as the shell needs some way of determining where the quoted string ends.

So, if you use "{joboutput}" and {joboutput} is replaced with something that contains ", $, \ or `, it'll break; and if you use '{joboutput}' and {joboutput} is replaced with something that contains a ', it'll break.

Some languages might have more verbose quotes, like Python's """/''', which might help in that they'd allow the occasional lone quote or quote pair, but would of course not still be totally general, since the output could contain the same """ or '''.

The shell doesn't have those, though. The nearest thing is probably here-docs, which are delimited by a freely-chosen line. That would help for embedding almost arbitrary strings, but it does require being able to pass multiple lines, and is rather hairy.

If it's possible to use a multiline command, this should work if {joboutput} is replaced by anything that's not END_OF_JOB_OUTPUT barring parsing bugs in the shell related to the here-doc inside a command substitution. You could change the here-doc separator to any other string, one that's unlikely to appear in the output. However, since the data goes through a command substitution here, any trailing newlines in it are lost.

out=$(cat <<'END_OF_JOB_OUTPUT'
{joboutput}
END_OF_JOB_OUTPUT
)
python script.py --job-output "$out"

or without nesting the here-doc in the command substitution:

exec 9<<'END_OF_JOB_OUTPUT'
{joboutput}
END_OF_JOB_OUTPUT
out=$(cat <&9)
exec 9<&-

printf "%s\n" "$out"


If the scheduler was able to launch that command directly, without involving the shell, we wouldn't need to care about the shell's syntax. The scheduler would just need to have some smarts to explicitly pass script.py, --job-output and the job output as distinct arguments. But we don't know if it can do that. (Also, in that case, you wouldn't use the quotes around the placeholder.)

Another way that would be easier than the shell shenanigans above would be to pass the string through an environment variable or a file, if that's supported by the scheduler.

ilkkachu
  • 138,973
  • 1
    Just adding a tidbit of info: @ikkachu chose to surround END_OF_JOB_OUTPUT within singlequotes ( 'END_OF_JOB_OUTPUT' ) as without those singlequotes, the shell would try to evaluate things within ${joboutput} ( for ex, replacing $something with its value (or empty string if not defined) at the reading shell level, etc). – Olivier Dulac May 02 '22 at 13:56
  • @ilkkachu Thanks very much! Your here-docs example is what I was looking for. It appears that even as a one liner this works (I added a semicolon before the python command). Appreciate your help! – dreamer May 03 '22 at 18:26
  • @dreamer, ehmh? The shell needs newlines for the here-doc, so now I'm left confused how it works as a one-liner? :) Unless of course the scheduler has some weird way of letting you enter those newlines... – ilkkachu May 03 '22 at 18:33
  • @ilkkachu Actually - I jumped the gun, only for an example I tried which had balanced quotes. I don't think the scheduler accepts multi line commands but I will have a play around. In any case, I think this is probably the closest to a solution for my particular case I'm going to get - so hopefully I can make it work – dreamer May 03 '22 at 19:07
  • @dreamer, if it has any other way of inserting the data than straight replacement on the command line, that would be really good. Or if there's any way to mangle the inserted string (where ever it originally comes from) so that it can be inserted more safely. (i.e. drop any single quotes, or replace them with '\'') – ilkkachu May 03 '22 at 19:14
  • @ilkkachu Good suggestions, makes sense, thanks! I will do more digging in the documentation. Will report back if I have any luck. – dreamer May 03 '22 at 19:22
  • @ilkkachu I'm wondering if perhaps there are ways to achieve one line here docs, see e.g. https://unix.stackexchange.com/questions/256539/is-it-possible-to-do-a-here-document-in-one-line-or-echo-verbatim and https://unix.stackexchange.com/questions/370098/single-line-heredocument-possible-in-bash . Just haven't quite worked out yet if it can be adopted to my scenario. – dreamer May 04 '22 at 09:29
  • @dreamer, none of that really helps, as far as I can think of. Here-strings need string delimiters like any string, and shoving a multi-line string through eval also needs that. You'd need a quoting operator that terminates on the newline, and doesn't care about anything else. And I'm not sure if any language has one. – ilkkachu May 04 '22 at 10:49
  • but yeah, if the tool allowed exporting the string through the environment, it'd be as easy as python script.py --job-output "$ENV_VAR". – ilkkachu May 04 '22 at 10:50
  • Thanks @ilkkachu - indeed I think you're right - the above 2 solutions still require quote delimiters. It's a shame bash doesn't just allow triple quotes like Python. – dreamer May 04 '22 at 13:30