6

The following script searches files with the suffix .tex in a directory (i.e. TeX files), for the string \RequireLuaTeX, i.e. LuaTeX files in that directory, and creates a Bash array from the results.

It then runs the command latexmk on the files in that array.

I'd like to exclude a list of user defined files from this array, possibly declared as an array thus:

excludedfiles=(foo.tex bar.tex baz.tex)

I'm writing to solicit suggestions for clean ways to do this.

I quite like the approach of putting everything in an array. For one thing, it makes it easy to list the files before running commands on them. But I'm willing to consider other approaches.

#!/bin/bash                                
## Get LuaTeX filenames     
mapfile -t -d "" filenames < <(grep -Z -rL "\RequireLuaTeX" *.tex)

Run latexmk on PDFTeX files.

for filename in "${filenames[@]}" do base="${filename%.*}" rm -f "$base".pdf latexmk -pdf -shell-escape -interaction=nonstopmode "$base".tex done

BACKGROUND AND COMMENTS:

TeX users may be confused by my question. So I'm explaining here what I was trying to do, and how I miswrote the question. I'm not changing it, because the change would invalidate the existing answers and create confusion.

I have a collection of LaTeX files. The older ones use PDFLaTeX. The newer ones mostly use PDFLaTeX. This question is about the PDFLaTeX ones. What I'm trying to do in my script is

a) Create a list of PDFLaTeX files. My LuaLaTeX files contain the string "\RequireLuaTeX" in them. Therefore, files which do not contain that string are PDFLaTeX files.

So, I am trying to create a list of LaTeX files which do not contain the string "\RequireLuaTeX" in them.

b) Run PDFLaTeX on them using latexmk.

My question has the following error. I wrote:

The following script searches files with the suffix .tex in a directory (i.e. TeX files), for the string \RequireLuaTeX, i.e. LuaTeX files in that directory, and creates a Bash array from the results.

In fact I want files which do not contain that string, because as explained above, those correspond to my PDFLaTeX files.

Faheem Mitha
  • 35,108
  • I may be wrong here, but doesn't latexmk override the .pdf files by itself? Is there some special reason for the rm? – Quasímodo Dec 13 '20 at 12:16
  • @Quasímodo Good point. I think I wanted to force a complete rebuild for some reason. Possibly that doesn't make sense. I'll review that, thanks. Note that AFAIK latexmk won't do a rebuild if the PDF file is newer than all the source file. It acts like make in that respect. – Faheem Mitha Dec 13 '20 at 16:11

6 Answers6

7

-L flag to Grep list files not matching a pattern. You want -l instead. Also, Grep needs to see double-backslash to match a single backslash.

Since you are in Bash, let us get hold of some useful constructs.

#!/bin/bash -
shopt -s globstar extglob
mapfile -t -d "" filenames < <(grep -Zl '\\RequireLuaTeX' ./**/!(foo|bar|baz).tex)
rm -f "${filenames[@]/%.tex/.pdf}"
latexmk -pdf -shell-escape -interaction=nonstopmode "${filenames[@]}"
  • **/!(foo|bar|baz).tex expands to all files in the current directory tree that end in .tex but whose basename is not foo.tex, bar.tex nor baz.tex. Both globstar and extglob are required for this operation.

  • "${filenames[@]/%.tex/.pdf}" expands to all elements of the array, substituting each trailing .tex by .pdf.

Since Latexmk can be given multiple files as arguments, we could skip for-loops.

Quasímodo
  • 18,865
  • 4
  • 36
  • 73
6

With zsh, you can turn an array into a pattern that matches any of its elements by joining with | with the j[|] parameter expansion flag the elements inside which the glob characters have been escaped with the b parameter expansion flag:

#! /bin/zsh -
set -o extendedglob
excluded_file_names=(foo.tex bar.tex baz.tex)
excluded_file_names_pattern="(${(j[|])${(@b)excluded_file_names}})"

here using the ~ extendedglob operator to apply the exclusion

tex_files=( ./*/(.tex~$~excluded_file_names_pattern) )

files=( ${(0)"$(grep -lZF '\RequireLuaTeX' $tex_files)"} ) rm -f ${files/%tex/pdf} latexmk -pdf -shell-escape -interaction=nonstopmode $files

Or you could use the e glob qualifier to check if the tail of the file path is in the array:

#! /bin/zsh -
excluded_file_names=(foo.tex bar.tex baz.tex)

tex_files=( ./*/.tex(^e['(($excluded_file_names[(Ie)$REPLY:t]))']) )

files=( ${(0)"$(grep -lZF '\RequireLuaTeX' $tex_files)"} ) rm -f ${files/%tex/pdf} latexmk -pdf -shell-escape -interaction=nonstopmode $files

5

The way I approach this kind of problem is to turn the list of file names/patterns into a hash that has instant lookup with no searching required. (Note that the excludedFiles patterns such as z*.tex are expanded as part of the assignment, not as part of the hashing loop. For example, if there are three files matching the z*.tex glob, then excludedFiles will contain three entries rather than the one pattern, and the hashing loop will iterate three times.)

# User configurable list of files and patterns
excludedFiles=(foo.tex bar.tex baz.tex z*.tex)

Convert the list into a hash

declare -A excludedHash for excludedFile in "${excludedFiles[@]}" do [[ -e "$excludedFile" ]] && excludedHash[$excludedFile]=yes done

Processing

for filename in "${filenames[@]}" do [[ -n "${excludedHash[$filename]}" ]] && continue # Skip if filename is in hash

base=&quot;${filename%.*}&quot;
rm -f &quot;$base&quot;.pdf
latexmk -pdf -shell-escape -interaction=nonstopmode  &quot;$base&quot;.tex

done

Kusalananda
  • 333,661
Chris Davies
  • 116,213
  • 16
  • 160
  • 287
  • Note that the z*.tex is expanded once, when you set excludedFiles=(...), not against basenames of filenames[@] (a hash lookup couldn't do that anyway). That's fine for one directory, but does take O(n) work to list the directory and match the glob against every one. (And involves system calls.) Still, it does avoid a combinatorial explosion of work that you'd get if you naively match each filename against each of multiple patterns, and is probably a performance win if multiple of the excludedFiles entries are fixed strings. – Peter Cordes Dec 13 '20 at 14:34
  • @PeterCordes yes, absolutely on all counts. I've amended the answer to try and make that explicitly clear – Chris Davies Dec 13 '20 at 18:12
2

With bash, one option is GLOBGIGNORE:

The GLOBIGNORE shell variable may be used to restrict the set of file names matching a pattern. If GLOBIGNORE is set, each matching file name that also matches one of the patterns in GLOBIGNORE is removed from the list of matches. If the nocaseglob option is set, the matching against the patterns in GLOBIGNORE is performed without regard to case. The filenames . and .. are always ignored when GLOBIGNORE is set and not null. However, setting GLOBIGNORE to a non-null value has the effect of enabling the dotglob shell option, so all other filenames beginning with a '.' will match. To get the old behavior of ignoring filenames beginning with a '.', make '.*' one of the patterns in GLOBIGNORE. The dotglob option is disabled when GLOBIGNORE is unset.

So the following will not include foo.tex, bar.tex or baz.tex in the wildcard expansion:

GLOBGIGNORE=foo.tex:bar.tex:baz.tex
grep ... *.tex

GNU grep also has its own ways for excluding files and directories, such as --exclude-from to take a file containing list of globs to exclude:

grep --exclude-from=<(printf "%s\n" "${excludedfiles[@]}") ...

Or --exclude to specify each glob individually:

declare -a grep_options
for f in "${excludedfiles[@]}"
do
    # add --exclude for each file
    grep_options+=(--exclude="$f")
done
grep "${grep_options[@]}" ...
muru
  • 72,889
  • 1
    Beware however that if you have GLOBIGNORE=foo.tex, that prevents foo.tex from being included in the expansion of *.tex, but not ./foo.tex from being excluded from the expansion of ./*.tex or dir/foo.tex from being excluded from the expansion of **/*.tex. Same applies to GNU grep's --exclude*. Those features are only usable to exclude some file extensions with patterns like *.bak, *~... See ksh's FIGNORE for a slightly better design. In bash, you generally need to revert the setting of dotglob. See also there – Stéphane Chazelas Dec 13 '20 at 09:45
  • 1
    Also note that if the filenames you want to exclude contain wildcard characters or : or \, you'll need to escape them, and since bash has no equivalent to zsh's b parameter flag and since its ${var//pat/repl} (contrary to that of ksh93/zsh) has no way to recall the matched part in the replacement, that's going to be quite painful. – Stéphane Chazelas Dec 13 '20 at 10:55
  • @muru: I just wanted to confirm that you are suggesting two distinct ways to exclude a list of files here. Though you don't have a complete script in either case. – Faheem Mitha Jan 24 '21 at 11:03
  • @Faheem yes, two distinct ways, GLOBIGNORE in bash and --exclude in grep. I believe the snippets given here are sufficient for understanding it's writing scripts is left as an exercise to the reader. – muru Jan 24 '21 at 11:28
0

I suggest a simple for-loop without arrays:

excludedfiles="foo.tex|bar.tex|baz.tex"

for i in $(ls .tex | egrep -vx ${excludedfiles});do filename=$(grep -H "\RequireLuaTeX" $i | awk -F ':' '{print $1}') base=${filename%.} if [[ "$base" == "" ]];then continue; fi rm -f "$base".pdf latexmk -pdf -shell-escape -interaction=nonstopmode "$base".tex done

What it does:

  1. Looks for the files .tex
  2. Filters out specified files (egrep)
  3. Text-search specified pattern \RequireLuaTeX ("\\" to include \ in search)
  4. Checks for empty variable filename and skips to next if empty (happens if grep doesn't find a match)
  5. Finish with given commands

I recommend dry-running the script without the final rm and latex command and verify the output by "echo"ing "$base". I can imagine problems with spaces in filenames (with every solution).

...
  if [[ "$base" == "" ]];then continue; fi
  echo $base
done

You can pull out all search-patterns and put them in a variable for better adjustment and easy-handling if you wish.

0

Although I'm impressed by the reduced linecount and improved efficiency of other answers, I'd like to offer a different approach which prioritizes simplicity of code. This approach will make your future maintenance easier and reduce your errors and your time spent testing.

In practice, for cases like this, there is no noticeable difference in running time and the benefits of a reduced linecount are more than offset by the increased complication of the code involved, making it harder to find errors by code review in the shorter code.

#!/bin/bash

dryrun=false [[ $1 = -n ]] && dryrun=true

bye() { >&2 printf "abort: %s\n" "$*"; exit 1; }

for f in *.tex ;do [[ $f = foo.tex || $f = bar.tex || $f = baz.tex ]] && continue grep -q '\RequireLuaTeX' "$f" && continue if $dryrun ;then printf "dryrun: %s\n" "$f" continue fi rm -f "${f%.tex}.pdf" || bye rm latexmk -pdf -shell-escape -interaction=nonstopmode "$f" || bye latexmk done

jrw32982
  • 723