5

I’m writing something that deals with file matches, and I need an inversion operation. I have a list of files (e.g. from find . -type f -print0 | sort -z >lst), and a list of matches (e.g. from grep -z foo lst >matches – note that this is only an example; matches can be any arbitrary subset (including empty or full) or lst), and now I want to invert this list.

Background: I’m sorta implementing something like find(1) excepton file lists (although the files do exist in the filesystem at the point of calling, the list may have been pre-filtered). If the list of files weren’t potentially so large, I could use find "${files[@]}" -maxdepth 0 -somecondition -print0, but even moderate use of what I’m writing would go beyond the Linux or BSD argv size limit.

If the lines were not NUL-separated, I could use comm -23 lst matches >inverted. If the matches were not NUL-separated, I could use grep -Fvxzf matches lst. But, from the generators I mentioned in the first paragraph, both are.

Assume GNU tools are installed, so this needs not be portable beyond e.g. Debian, as I’m using find -print0, sort -z and friends already (although some BSDs have it, so if it can be done in “more portable”, I won’t complain).

I’m trying to do code reuse here; plus, comm -23 is basically the perfect tool for this already except it doesn’t support changing the input line separator (yet), and comm is an underrated and not-enough-well-known tool anyway. If the Unix/Linux toolbox doesn’t offer anything sensible, I’m likely to reimplement a form of comm -23 (reduced to just this one use case) in shell, as the script already (for other reasons) requires a shell that happens to support read -d '' for NUL-delimited input, but that’s going to be slow (and effort… I posted this at the end of the workday in the hopes someone has got an idea for when I pick this up tomorrow or on the 28th).

Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232
mirabilos
  • 1,733

2 Answers2

6

If your comm supports non-text input (like GNU tools generally do), you can always swap NUL and nl (here with a shell supporting process substitution (have you got any plan for that in mksh btw?)):

comm -23 <(tr '\0\n' '\n\0' < file1) <(tr '\0\n' '\n\0' < file2) |
  tr '\0\n' '\n\0'

That's a common technique.

  • Hm yes, @Costas suggested the same in a comment 2 minutes earlier. Interesting idea, will have to try; sometimes, things need input from more than one person to truly shine, apparently (do you have a patch? it’s been on the wishlist for ages… the job management is the tricky part, not parsing). – mirabilos Dec 23 '15 at 21:42
  • 1
    @mirabilos, looks like other shells ignore the job management issue altogether (start them like background tasks in their own process group and don't care for them after they're started) and get away with it. – Stéphane Chazelas Dec 23 '15 at 21:50
-2

If your are using grep to search match, you can use the -v option of grep to have line that not match.

alexises
  • 533