I’m writing something that deals with file matches, and I need an inversion operation. I have a list of files (e.g. from find . -type f -print0 | sort -z >lst
), and a list of matches (e.g. from grep -z foo lst >matches
– note that this is only an example; matches
can be any arbitrary subset (including empty or full) or lst
), and now I want to invert this list.
Background: I’m sorta implementing something like find(1) excepton file lists (although the files do exist in the filesystem at the point of calling, the list may have been pre-filtered). If the list of files weren’t potentially so large, I could use find "${files[@]}" -maxdepth 0 -somecondition -print0
, but even moderate use of what I’m writing would go beyond the Linux or BSD argv
size limit.
If the lines were not NUL-separated, I could use comm -23 lst matches >inverted
. If the matches were not NUL-separated, I could use grep -Fvxzf matches lst
. But, from the generators I mentioned in the first paragraph, both are.
Assume GNU tools are installed, so this needs not be portable beyond e.g. Debian, as I’m using find -print0
, sort -z
and friends already (although some BSDs have it, so if it can be done in “more portable”, I won’t complain).
I’m trying to do code reuse here; plus, comm -23
is basically the perfect tool for this already except it doesn’t support changing the input line separator (yet), and comm is an underrated and not-enough-well-known tool anyway. If the Unix/Linux toolbox doesn’t offer anything sensible, I’m likely to reimplement a form of comm -23
(reduced to just this one use case) in shell, as the script already (for other reasons) requires a shell that happens to support read -d ''
for NUL-delimited input, but that’s going to be slow (and effort… I posted this at the end of the workday in the hopes someone has got an idea for when I pick this up tomorrow or on the 28th).
os.walk()
. But the same principle applies. – muru Dec 23 '15 at 21:08set
will. – muru Dec 23 '15 at 21:10comm
with 2 inverts:comm -23 <(tr '\n\0' '\0\n' <lst) <(tr '\n\0' '\0\n' <matches) | tr '\n\0' '\0\n'
– Costas Dec 23 '15 at 21:34