0

I have a big directory tree with sources files (*.c), where some, (but not all), of which are actually generated via a preprocess which produces the whatever.c from a whatever.qc file.

Often, I find myself needing to do something on these files, but only the ultimate source files -- that is, if there is a whatever.qc, there is no point in looking at whatever.c (but if there is no whatsit.qc then I do need to look at whatsit.c).

Starting from something like this:

find data-utils -name '*.qc' -o -name '*.c' | xargs grep SMS_GEN

and lets assume a portion of the output of the find is:

data-utils/whatever.c
data-utils/whatsit.c
data-utils/whatever.qc

Is there some existing tool I can use to filter the output of the find so I don't pass the whatever.c's to xargs (or whatever follows the find). That is, the filtered result from above should be:

data-utils/whatsit.c
data-utils/whatever.qc

Or am I going to need to write something from scratch?

2 Answers2

3

Finding the files with a .c filename suffix, but returning the pathname of corresponding .qc files if such files exists:

find server/data-utils -type f -name '*.c' -exec sh -c '
    for pathname do
        if [ -f "${pathname%.c}.qc" ]; then
            printf "%s\n" "${pathname%.c}.qc"
        else
            printf "%s\n" "$pathname"
        fi
    done' sh {} +

This finds the pathnames of all the regular files in or under the server/data-utils search path that has names ending in .c. For batches of these pathnames, a short shell script is called. The shell script tests each given pathname with the .c replaced by .qc, and if that modified pathname refers to an existing regular file (or to a symbolic link to one), it is printed. Otherwise, the original pathname is printed.

Related:


Just a variation of the above with the DRY principle applied:

find server/data-utils -type f -name '*.c' -exec sh -c '
    for pathname do
        qc_pathname=${pathname%.c}.qc

        if [ -f "$qc_pathname" ]; then
            out=$qc_pathname
        else
            out=$pathname
        fi

        printf "%s\n" "$out"
    done' sh {} +

... or even just

find server/data-utils -type f -name '*.c' -exec sh -c '
    for pathname do
        qc_pathname=${pathname%.c}.qc
        [ -f "$qc_pathname" ] && pathname=$qc_pathname
        printf "%s\n" "$pathname"
    done' sh {} +
Kusalananda
  • 333,661
0

I came up with an awk script to do the job. It's not particularly efficient, O(n^2), but it has the benefit of not reordering the (unfiltered) lines.

function supercedes ( a, b ) {
        suffixa=match(a, /\.[^/.]*$/);
        suffixb=match(b, /\.[^/.]*$/);
        if ((suffixa == 0) || (suffixa != suffixb)) return 0;
        if (substr(a,1,suffixa) != substr(b,1,suffixb)) return 0;
        return (substr(a,suffixa) == A) && (substr(b,suffixb) == B);
}
BEGIN { n = 0; }
{ item[n++] = $0; }
END {
        for (i = 0; i < n; ++i) {
                show = 1;
                for (j = 0; j < n; ++j) {
                        if (j != i) {
                                if (supercedes(item[i], item[j])) {
                                        show = 0;
                                        break;
                                }
                        }
                }
                if (show) print item[i];
        }
}

sample usage:

find server/data-utils -name '*.qc' -o -name '*.c' | \
    awk -f filter.awk -vA=.c -vB=.qc