4

I have a function posix that I use in the Git bash shell on Windows to transform DOS-style paths to normal Unix-style paths. Since DOS-style paths use a backslash as separator, I have to quote the path argument to prevent the backslashes as being used by the shell to denote the next character as a literal. Is there any way to get the uninterpreted argument from inside my function, so that I don't need to quote it?

Here is my function, if it helps:

function posix() {
  echo $1 | sed -e 's|\\|/|g' | sed -re 's|^(.)\:|/\L\1|'
}

(By the way, I welcome any comments with tips for improving the function in other ways unrelated to solving the quoting/shell-interpretation problem.)

iconoclast
  • 9,198
  • 13
  • 57
  • 97
  • 7
    The answer to your question is "no." If you don't want something to be expanded, quote it. This is a standard convention that is rather fundamental to how shell scripting works. Are you trying to invent your own scripting language? – jw013 Sep 28 '12 at 16:26
  • 2
    No, I'm not. And I'm well aware of how things normally work. Are you trying to be snarky? – iconoclast Sep 28 '12 at 16:29
  • Then what are you trying to do? It sounds like you are trying to fundamentally change the way shell quoting / parsing rules work. IIRC zsh has a way to declare that certain functions should not have their arguments expanded, but zsh is a very atypical and nonstandard shell, and you won't find most of its features anywhere else. – jw013 Sep 28 '12 at 16:34
  • 2
    Furthermore, what does "uninterpreted" even mean? Do you not want the shell to split arguments on spaces anymore? What about splitting the command from its arguments -- should it split there? On what? What if the line ends in a backslash, do you want to suppress the normal line continuation mechanism? What about multiple commands that were separated by semicolons? I think you'll find that regardless of how you answer these, because they change how the shell interprets a command, it just doesn't make sense to be able to pull up a "raw" version of the command line from within a function. – Jim Paris Sep 28 '12 at 17:31
  • 1
    It may be a solution using history but will not work in all cases, so i don't recommend – Nahuel Fouilleul Sep 28 '12 at 18:57
  • @jw013: zsh is awesome, so it doesn't surprise me that it offers a way of getting around the limitations that remain in other shells. And that's exactly the kind of information I was trying to find. Thanks for the tip. – iconoclast Sep 28 '12 at 19:00
  • 1
    how is the function used? I do not understand in what context that problem arises. – miracle173 Sep 29 '12 at 06:29
  • "Uninterpreted" is an awful idea. On Windows, there is no single convention for how a command line is split into arguments. At the operating system level, you get a string. A WinMain function gets it as one argument! Now Microsoft's Visual C run-time library does it in some particular way to produce arguments for the standard C main. It's hacky, and incompatible with other schemes on the same platform. Sure, for a simple A B C command line, there is agreement, but once you bring quotes into the picture, and whatnot, all bets are off. – Kaz Nov 23 '13 at 06:48

3 Answers3

7

The uninterpreted shell arguments are $1, $2, etc. You need to put their expansion in double quotes in most contexts, to avoid the value of the parameter being expanded further. "$@" gives you the list of all parameters.

For example, if you want to pass an argument of the shell script to your function, call it like this:

first_argument_as_filename_in_unix_syntax=$(posix "$1")

The double quotes are necessary. If you write posix $1, then what you're passing is not the value of the first parameter but the result of performing word splitting and globbing on the value of the first parameter. You will need to use proper quoting when calling the script, too. For example, if you write this in bash:

myscript c:\path with spaces\somefile

then the actual, uninterpreted arguments to myscript will be c:path, with and spacessomefile. So don't do this.

Your posix function is wrong, again because it lacks double quotes around $1. Always put double quotes around variable and command substitutions: "$foo", "$(foo)". It's easier to remember this rule than the exceptions where you don't actually need the quotes.

echo does its own processing in some cases, and calling external processes is slow (especially on Windows). You can do the whole processing inside bash.

posix () {
  path="${1//\\//}"
  case "$path" in
    ?:*) drive="${p:0:1}"; drive="${drive,}"; p="/$drive/${p:2}";;
  esac
  printf %s "$p"
}

The zsh feature that jw013 alluded to doesn't do what you seem to think it does. You can put noglob in front of a command, and zsh does not perform globbing (i.e. filename generation, i.e. expansion of wildcards) on the arguments. For example, in zsh, if you write noglob locate *foo*bar*, then locate is called with the argument *foo*bar*. You'd typically hide the noglob builtin behind an alias. This feature is irrelevant for what you're trying to do.

  • Thanks for the extremely thorough answer, and the big improvements to the function. It sounds like my use of "uninterpreted" was not clear, however. What I mean by that is the actual characters I typed on the command line (including \), but after splitting things into separate arguments. If I'm reading between the lines of these answers correctly, the problem may be that splitting into separate arguments happens after expansion and handling of quoting and so forth, and the shell offers no way of changing this fact. Is that correct? – iconoclast Sep 29 '12 at 03:27
  • @iconoclast Handling of quoting happens early on in expansion, the shell has to know e.g. that echo 'foo; bar' has the semicolon inside the argument to echo and not as a command separator. Expanding variable values happens later, and splitting into words even later. The “actual characters you typed on the command line” are only meaningful inside the shell where you typed them; if you had used a shell with different quoting rules, or no shell at all, these would have a different meaning, or there'd be no such thing. – Gilles 'SO- stop being evil' Sep 29 '12 at 12:42
5

While other answers may be correct in stating that you cannot receive "uninterpreted" shell input via the means they mention, they are wrong in categorically denying it as a possibility. You can certainly receive it before the shell interprets it if you instruct the shell not to interpret it. The humble POSIX heredoc makes this very simply possible:

% sed -e 's@\\@/@g' -e 's@\(.\):\(.*\)@/drive/\1\2@' <<'_EOF_'     
> c:\some\stupid\windows\place
> _EOF_
/drive/c/some/stupid/windows/place

EDIT1:

In order to pass such a string to a shell function as a shell argument you're going to need to store it in a shell variable. Generally you cannot simply var=<<'HEREDOC' unfortunately, but POSIX does specify the -r argument to the read builtin:

% man read

POSIX PROGRAMMER'S MANUAL

...

By default, unless the -r option is specified, backslash ( '\' ) shall act as an escape character, as described in Escape Character (Backslash) . If standard input is a terminal device and the invoking shell is interactive, read shall prompt for a continuation line when:

  • The shell reads an input line ending with a backslash, unless the -r option is specified.

  • A here-document is not terminated after a new line is entered.

When combined, read and the heredoc make this a trivial and portable matter as well, though it may not feel very intuitive at first:

% _stupid_mspath_fix() { 
> sed -e 's@\\@/@g' -e 's@\(.\):\(.*\)@/drive/\1\2@' <<_EOF_
>> ${1}
>> _EOF_
> }
% read -r _stupid_mspath_arg <<'_EOF_'                    
> c:\some\stupid\windows\place
> _EOF_
% _stupid_mspath_fix ${_stupid_mspath_arg}
/drive/c/some/stupid/windows/place

EDIT2:

Probably you noticed the difference between the two heredocs in the second example. The heredoc _EOF_ terminator within the function is unquoted, while the one fed to read is quoted with single quotes. In this way the shell is instructed to perform expansion on the heredoc with an unquoted terminator, but not to do so when its terminator is quoted. It doesn't break when expanding the unquoted heredoc in the function because the value of the variable it expands is already set as a quoted string and it doesn't parse it twice.

Probably what you want to do involves piping your Windows path from the output of one command into the input of another dynamically. Command substitution within a heredoc makes this possible:

% _stupid_mspath_fix() { 
> sed -e 's@\\@/@g' -e 's@\(.\):\(.*\)@/drive/\1\2@' <<_EOF_
>> ${1}
>> _EOF_
> }
% read -r _stupid_mspath_arg <<'_EOF_'                    
> c:\some\stupid\windows\place
> _EOF_
% _stupid_mspath_fix ${_stupid_mspath_arg}
/drive/c/some/stupid/windows/place    
% read -r _second_stupid_mspath_arg <<_EOF_                    
> $(printf ${_stupid_mspath_arg})
> _EOF_
% _stupid_mspath_fix ${_second_stupid_mspath_arg}
/drive/c/some/stupid/windows/place

So basically if you can reliably output the backslashes from some application (I used printf above), then running that command within $(...) and enclosing that within an unquoted heredoc passed to another application that can reliably accept the backslashes as input (such as read and sed above) will bypass the shell's parsing of your backslashes altogether. Whether or not the applications can handle the backslashes as input/output is something you'll have to find out for yourself.

Not strictly relevant to the question:

In Gilles's answer he recommends the ${var/search/replace} parameter expansion form, which, though cool, is not POSIX. It is definitely a bashism. It wouldn't matter to me, but in his edits he retained the posix () function name, and that may be misleading to some.

On that note, the original post's posix () function makes use of the very convenient extended regex sed -r argument, but that is also, unfortunately, not POSIX. POSIX does not specify an extended regex argument for sed, and its use is therefore possibly unreliable.

My account at stack overflow is only a few only days old as well, but I've posted a few answers there dealing specifically with POSIX parameter expansion which you can find linked to from my profile page, and in which I include quotes from the POSIX guidelines and links thereto. You will also find a few in which I demonstrate other uses of the heredoc, such as reading an entire shell script into a shell variable, parsing and manipulating it programmatically, then finally running its new version all done from within another script or shell function. Just saying.

mikeserv
  • 58,310
  • Amazing answer! It might take me a while to absorb it all. – iconoclast Nov 26 '13 at 02:08
  • I'm trying to do the same - that's why I wrote it. I figure it's time I learned how to script, so I'm slowly working my way through writing the answers to questions I find interesting in the hopes that I'll learn something in the meanwhile. – mikeserv Nov 26 '13 at 04:32
2

You can't have the shell process "uninterpreted" input. What you type at the command line is just a string of characters precisely intended to be interpreted by the shell. You can't have the shell pass your literal, typed, characters to a function or command, because it has to interpret them to know what command/function to invoke and with what arguments! When you type something at the command prompt, you have to accept that you have to type according to the rules of shell interpretation (because you're using a shell!).

The rules of shell interpretation are very helpfully chosen. Rather than getting in your way, they give you a way of instructing the shell to do anything you want, including specifying arbitrary arguments to things. Without interpretation, there's no way for the shell to extract an action to execute precisely from your input. Interpretation implies special characters (like space), and without escaping, there would be no way to pass those characters to a command.

Regarding the interpretation itself, I've always found the bash documentation on this to be very good. Briefly, things like tilde expansion happen very early (and tilde expansion only in certain situations). Then, variable substitution and word splitting happen, and finally globbing.

Nicholas Wilson
  • 1,068
  • 1
  • 11
  • 19