1

I have a Reference file that contains

a
b
c
d

I have to check all files in a subfolder recursively that contains all the lines of Reference file consecutively and delete those files.

For example, if a file contains:

y
z
a
b
c
d
w
1

, the file should be deleted.

But, if a file contains

y
z
a
b
3
c
d
w
1
2

it should not be deleted.

Porcupine
  • 1,892

2 Answers2

1

If using perl is an option, here is a small script that does the job for one file, it merely reads reference and input file, tries to substitute reference pattern with empty string. If size is changed, writes to out file. Call it with reference and input filenames as command line arguments.

#!/bin/perl 

sub readfile {
  my ($filename) = @_;
  my $content;
  open(my $fh, '<', $filename) or die "cannot open file $filename"; {
    local $/;
    $content = <$fh>;
  }
  close($fh);
    return $content;
}

sub writefile {
  my ($filename, $content) = @_;
  open(my $fh, '>', $filename) or die "cannot open file for writing: $filename"; {
    print $fh $content;
  }
  close($fh);
}

my $txtref = readfile($ARGV[0]);
my $txtin = readfile($ARGV[1]);

my $txtout = $txtin;
$txtout =~ s/$txtref//g;

if (length($txtin) ne length($txtout)) {
    print STDOUT "changes, length ".length($txtin)." => ".length($txtout)."\n";
    my $outf = $ARGV[1].".out";
  writefile($outf, $txtout);
} else {
    print STDOUT "no changes\n";
}

Just insert the call in a shell loop using find - for example - to operate on directory contents.

tonioc
  • 2,069
  • Thanks for the answer. May be I was not clear, please see the edit. I just want to identify files that contain the contents of Reference file at any posiition, and if so, then delete the entire file. – Porcupine Sep 23 '18 at 10:27
1

Try:

find /path/to -type f ! -name 'reference_file' -exec python -c "import os;
if (open('/path/to/reference_file').read() in open('{}').read()): print '{}: can be deleted'" \;

replace print '{}: can be deleted' with os.remove('{}') to delete that file when you were happy with the result.

Related:

αғsнιη
  • 41,407