5

Suppose I have a file that contains, among many other things,

\command{arg1,arg2,arg3}

(arguments been paths, expressed with /, ., characters and numbers)

But that a user can as well call it with

\command{arg1,
arg2 ,
arg3
}

That is, on several lines and with superfluous spaces.

I'd like to find a regular pattern to include in a shell script so that n variables will contain the n arguments. How to proceed ?


I managed to write

echo "\command{arg1,
    arg2 ,
    arg3
    }" | sed -n -e 's/\\command//p' | sed 's/,/\n/' | sed 's/{\|}//'

but that only outputs arg1, and I'm not even sure on how to store it in a variable.

Related:

But I was not able to combine all those ingredients to get what I want.

Clément
  • 358

2 Answers2

5

I'd like to find a regular pattern to include in a shell script so that n variables will contain the n arguments

The following creates a shell array arglist that contains each of the arguments:

$ readarray -t arglist < <(echo "\command{arg1,
    arg2 ,
    arg3
    }" | sed -n '/\\command/{ :a;/}/!{N;b a}; s/\\command{//; s/[ \n}]//g; s/,/\n/g; p}')

By using the declare statement, we can see that it worked:

$ declare -p arglist
declare -a arglist='([0]="arg1" [1]="arg2" [2]="arg3")'

Here is another example with the arguments on one line:

$ readarray -t arglist < <(echo "\command{arg1, arg2, arg3, arg4}"  | sed -n '/\\command/{ :a;/}/!{N;b a}; s/\\command{//; s/[ \n}]//g; s/,/\n/g; p}')

Again, it works:

$ declare -p arglist
declare -a arglist='([0]="arg1" [1]="arg2" [2]="arg3" [3]="arg4")'

Note that the space in < <( is essential. We are redirecting input from a process substitution. Without the space, bash will try something else entirely.

How it works

The sed command is a bit subtle. Let's look at it a piece at a time:

  • -n

    Don't print lines unless explicitly asked.

  • /\\command/{...}

    If we find a line that contains \command, then perform the commands found in the braces which are as follows:

  • :a;/}/!{N;b a}

    This reads lines into the pattern buffer until we find a line that contains }. This way, we get the whole command in at once.

  • s/\\command{//

    Remove the \command{ string.

  • s/[ \n}]//g

    Remove all spaces, closing braces, and newlines.

  • s/,/\n/g

    Replace commas with newlines. When this is done, each argument is on a separate line which is what readarray wants.

  • p

    Print.

John1024
  • 74,655
  • Wow, that is impressive. However, this solution does not seems to work if two arguments are on the same line. – Clément Jan 30 '15 at 23:42
  • 1
    @Clément I updated the answer with a new algorithm that uses commas, not newlines, to divide the arguments. – John1024 Jan 30 '15 at 23:47
  • That works like a charm, and the generous (and detailled) explanations you made surely will help me to get a better insight on sed. Thanks again. – Clément Jan 30 '15 at 23:57
  • And what is this supposed to do: s/[ \n}]//g? What if the arguments contain spaces, newlines or otherwise? What if they contain quotes? – mikeserv Jan 31 '15 at 09:08
  • @mikeserv If we imagine that these were shell commands and that the arguments were Unix file names, that would be a serious issue and some method for including/escaping those characters (and more) would be needed. But, as can be inferred from the OP's command format, this question is about LaTeX. In my experience, LaTeX arguments do not have special characters. However, Clément, if your files do have commands whose arguments contain within them spaces, newlines, or closing braces, let me know about them and I'll update the answer. – John1024 Jan 31 '15 at 18:31
  • @John1024 : you guessed correctly ;-). In fact, this regexp is here to find the bibliograpies (the bib files) used in a document, so the arguments are virtually any path + filename. – Clément Feb 03 '15 at 00:25
  • @Clément My \bibliography commands include just simple file names (no paths) and kpathsea handles the rest. If it is possible that your file names or paths could contain spaces, newlines, or braces, then I need more details, such as how do those characters appear in the file? Are they escaped? .... – John1024 Feb 03 '15 at 05:26
  • I want to grasp both \bibliography and \addbibresource commands. – Clément Feb 03 '15 at 09:09
  • To grasp all possibles arguments would lead to a complicated situation (see 3.6.1 of the Biblatex doc. to have an overview of the types of arguments accepted). We suppose the arguments are plain paths, without anything fancy (space, newlines, etc.), just [a-Z], [0-9], . and /. So the actual solution works like a charm. However, I should probably re-consider using kpathsea and web2c, but this is another subject ;). – Clément Feb 03 '15 at 09:20
3

With perl:

perl -l -0777 -ne '
  $n = 0;
  for (/\\command\{\s*(.*?)\s*\}/sg) {
    $n++;
    $i = 0;
    for $arg (split /\s*,\s*/, $_) {
      $arg =~ s/'\''/$&\\$&$&/g;
      print "arg${n}[$i]='\''$arg'\''";
      $i++;
    }
  }
  print "n=$n"' the-file

Would output something like:

arg1[0]='arg1'
arg1[1]='arg2'
arg1[2]='arg3'
n=1

Which you could evaluate like:

eval "$(perl ...)"

to create $arg1, $arg2... arrays for each of the \commands.

  • 1
    Congratulations on making 100000 and thanks for all the great answers! – iruvar Jan 30 '15 at 23:29
  • Perl is really handy and this is a nice solution, working seamlessly. However, i prefer to stick to sed (that I am trying to learn!). Thanks again, and congrats for the 100k too! – Clément Jan 31 '15 at 00:37