0

I have folders containing DPX images and I'd like to be able to check that the the file naming is sequential.

The file names can range from:

Frame 0000000.dpx to Frame 9999999.dpx

Folders will not likely contain this full range and could start and end at any of the numbers contained within the sequence above. The start will always be a lower number than the the end.

Any help would be greatly appreciated :-)

2 Answers2

0

Brute force method.

Showing sample directory contents:

/tmp/dpx-test
-rw-------.  1 root root    0 Feb 23 21:02 0
-rw-------.  1 root root    0 Feb 23 18:59 0000000
-rw-------.  1 root root    0 Feb 23 21:03 0000000.aaa
-rw-------.  1 root root    0 Feb 23 18:57 0000000.dpx
-rw-------.  1 root root    0 Feb 23 18:58 0000001.dpx
-rw-------.  1 root root    0 Feb 23 18:58 0000002.dpx
-rw-------.  1 root root    0 Feb 23 18:58 0000003.dpx
-rw-------.  1 root root    0 Feb 23 18:58 0000004.dpx
-rw-------.  1 root root    0 Feb 23 18:58 0000005.dpx
-rw-------.  1 root root    0 Feb 23 18:58 0000006.dpx
-rw-------.  1 root root    0 Feb 23 18:58 0000007.dp
-rw-------.  1 root root    0 Feb 23 18:58 0000008.dpx
-rw-------.  1 root root    0 Feb 23 18:58 0000009.dpx
-rw-------.  1 root root    0 Feb 23 21:00 000000x.dpx
-rw-------.  1 root root    0 Feb 23 20:56 0000011.dpx
-rw-------.  1 root root    0 Feb 23 18:59 0000019.dpx
-rw-------.  1 root root    0 Feb 23 21:02 0000022.dpy
drwx------.  2 root root    6 Feb 23 19:05 x
-rw-------.  1 root root    0 Feb 23 21:00 x000999.dpx
-rw-------.  1 root root    0 Feb 23 18:59 xxxx
[user1:/dpx-test:]#
[user1:/tmp/dpx-test:]#
[user1:/tmp/dpx-test:]# ls -1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9].dpx | wc -l
11
[user1:/tmp/dpx-test:]#
[user1:/tmp/dpx-test:]#
[user1:/tmp/dpx-test:]# ls -1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9].dpx
0000000.dpx
0000001.dpx
0000002.dpx
0000003.dpx
0000004.dpx
0000005.dpx
0000006.dpx
0000008.dpx
0000009.dpx
0000011.dpx
0000019.dpx

This is the script:

#!/bin/bash
D1="$1"  # Name of folder to check is passed in as argument #1
EXT='dpx'
echo "Checking for all missing ???????.dpx files."; echo
pushd "${D1}"
      # This work ASSUMES two or more '???????.dpx' files are in the given directory.
      # Gets first numbered file
      FstDPX="$(find . -type f -name '[0-9][0-9][0-9][0-9][0-9][0-9][0-9].dpx' | sort | head -1 | cut -d '/' -f2 | cut -d'.' -f1)"
      # Gets last numbered file
      LstDPX="$(find . -type f -name '[0-9][0-9][0-9][0-9][0-9][0-9][0-9].dpx' | sort | tail -1 | cut -d '/' -f2 | cut -d'.' -f1)"
      echo "First known file is:  ${FstDPX}.${EXT}"
      echo "Last  known file is:  ${LstDPX}.${EXT}"
      DPXcount="$(find . -type f -name '[0-9][0-9][0-9][0-9][0-9][0-9][0-9].'${EXT} | wc -l)"
      echo "Total number of '???????.dpx' files in $(pwd), is:  ${DPXcount}.";  echo
      if [ "${DPXcount}" -ge 3 ]; then
           # Convert value without leading zeros, and manually increment by 1(Fst) or 0(Lst).
           [ "${FstDPX}" == '0000000' ] && Fdpx="$(echo ${FstDPX} | awk '{print $0 + 1}')" \
                                        || Fdpx="$(echo ${FstDPX} | awk '{print $0 + 0}')"
           echo "FstDPX(${FstDPX}) ---- Fdpx(${Fdpx})  //First one to test for existance of//."
           Ldpx="$(echo ${LstDPX} | awk '{print $0 + 0}')"
           echo "LstDPX(${LstDPX}) ---- Ldpx(${Ldpx})."
           IDX="${Fdpx}"  # Established starting point to iterate through.
           echo "IDX(${IDX}) -- Fdpx(${Fdpx}) -- Ldpx(${Ldpx})"; echo
       echo "Now iterating through the directory and listing only those missing."; echo
       # Loop through UNTIL we've reached the end.
       until [ "${IDX}" -gt "${Ldpx}" ]; do
               # Convert back to number with leading zeros.
               IDXz=$(printf "%07d\n" ${IDX})
               # Test if
               [  ! -e "${IDXz}.dpx" ] && echo "File  '${IDXz}.dpx'  is missing"
               let "IDX=IDX+1"
       done
  else
       echo; echo "Not enough '???????.dpx' files to process."; echo
  fi

popd

Which produces this output:

Checking for all missing ???????.dpx files.

/tmp/dpx-test ~ First known file is: 0000000.dpx Last known file is: 0000019.dpx Total number of '???????.dpx' files in /tmp/dpx-test, is: 11.

FstDPX(0000000) ---- Fdpx(1) //First one to test for existance of//. LstDPX(0000019) ---- Ldpx(19). IDX(1) -- Fdpx(1) -- Ldpx(19)

Now iterating through the directory and listing only those missing.

File '0000007.dpx' is missing File '0000010.dpx' is missing File '0000012.dpx' is missing File '0000013.dpx' is missing File '0000014.dpx' is missing File '0000015.dpx' is missing File '0000016.dpx' is missing File '0000017.dpx' is missing File '0000018.dpx' is missing

  • Hi Joseph, thank you so much for this it works a treat! It might be brute force but I like the output that is returned :-)

    Is it possible to modify to take account of the leading Frame (plus space) in the filename? Or perhaps ignore anything preceding the 7 digits that are being interrogated for sequence?

    – jim_e_jib Feb 24 '23 at 21:30
0
#!/usr/bin/perl

# open the directory in the first arg (defaults to
# .) and get a sorted list of all files ending in
# .dpx into array @files
opendir(my $dir, shift // '.');
my @files = sort grep { /^Frame .*\.dpx$/ } readdir($dir);
close($dir);

# get the numeric value of the first and last
# element of the array
my ($first) = split /\./, $files[0];
my ($last)  = split /\./, $files[-1];

#print "$first\n$last\n";

# find and print any missing filenames
foreach my $i ($first..$last) {
  my $f = sprintf("%08i.dpx",$i);
  print "File '$f' is missing\n" unless -e $f
};

Save that as, say, find-missing.pl, and make it executable with chmod +x find-missing.pl.

First, I need to randomly create a bunch of matching files for a test run (ten or fewer files is enough for this test):

$ for i in {0..9} ; do
    [ "$RANDOM" -gt 16384 ] && printf "%08i.dpx\0" "$i" ;
  done | xargs -0r touch

$ ls -l *.dpx -rw-r--r-- 1 cas cas 0 Feb 24 13:30 00000000.dpx -rw-r--r-- 1 cas cas 0 Feb 24 13:30 00000001.dpx -rw-r--r-- 1 cas cas 0 Feb 24 13:30 00000003.dpx -rw-r--r-- 1 cas cas 0 Feb 24 13:30 00000005.dpx -rw-r--r-- 1 cas cas 0 Feb 24 13:30 00000006.dpx -rw-r--r-- 1 cas cas 0 Feb 24 13:30 00000007.dpx -rw-r--r-- 1 cas cas 0 Feb 24 13:30 00000008.dpx

In bash, $RANDOM gives a random number between 0 and 32767...so the for loop has a roughly 50% chance of creating any file. In this run, you can see that all but 00000002.dpx, 00000004.dpx, and 00000009.dpx were created.

and then run the perl script:

$ ./find-missing.pl .
File '00000002.dpx' is missing
File '00000004.dpx' is missing

NOTE: It doesn't mention 00000009.dpx because that is beyond the highest-numbered file found. If you want it to do that, then either hard-code $last to a suitable value, or take it from a command-line argument.


Second version, for filenames beginning with Frame . Also allows configuration by variables at the top of the script - BTW, there's no reason why these couldn't be taken from the command line (in array @ARGV, or with a module like Getopt::Std or Getopt::Long):

#!/usr/bin/perl

Configuration variables

my $digits = 7; my $prefix = 'Frame '; my $suffix = '.dpx';

Format string for printf

my $fmt = "$prefix%0${digits}i$suffix";

Open the directory in the first arg (defaults to

the current dir, ".") and get a sorted list of all

files starting with $prefix and ending in $suffix

into array @files

opendir(my $dir, shift // '.'); my @files = sort grep { /^$prefix.*$suffix$/ } readdir($dir); close($dir);

Get the numeric value of the first and last

element of the array by removing the filename

prefix (e.g. "Frame ") and suffix (e.g. ".dpx"):

my ($first, $last); ($first = $files[0]) =~ s/^$prefix|$suffix$//g; ($last = $files[-1]) =~ s/^$prefix|$suffix$//g;

#print "$first\n$last\n";

find and print any missing filenames

foreach my $i ($first..$last) { my $f = sprintf($fmt, $i); print "File '$f' is missing\n" unless -e $f };

BTW, ($first = $files[0]) =~ s/^$prefix|$suffix$//g; is a common perl idiom for assigning a value to a variable and modifying it with a substitution s/// operation. It's equivalent to:

$first = $files[0];
$first =~ s/^$prefix|$suffix$//g;

To print the total number of files (and the number of missing files too), change the final block of code (everything from # find and print any missing filenames onwards) in either version above to:

# find and print any missing filenames
my $missing = 0;
foreach my $i ($first..$last) {
  my $f = sprintf($fmt, $i);
  if (! -e $f) {
    print "File '$f' is missing\n";
    $missing++;
  };
};

printf "\nTotal Number of files: %i\n", scalar @files; printf "Number of missing files: %i\n", $missing;

That will produce output like this:

$ ./find-missing2.pl 
File 'Frame 00000002.dpx' is missing
File 'Frame 00000003.dpx' is missing

Total Number of files: 7 Number of missing files: 2

cas
  • 78,579
  • Hi cas, thank you so much for this it works a treat! I made a slight modification as my files only have seven numeric characters :-)

    Is it possible to modify to take account of the leading Frame (plus space) in the filename? Or perhaps ignore anything preceding the 7 digits that are being interrogated for sequence?

    – jim_e_jib Feb 24 '23 at 21:27
  • I'm not sure what you mean by "Leading Frame (plus space) in the filename" - do you mean that the filenames are actually Frame 0000000.dpx? yes, that's easy to deal with. I'll edit my answer to show how. – cas Feb 25 '23 at 00:29
  • Hi cas, thank you again for this - yes, that is exactly it! I have added back in your commented out print of the first and last frames, and also managed to calculate and print a total number of files.

    Is it possible to use a wildcard for the prefix variable? I have realised might be some instances where something other than Frame is used. I have tried inserting .*, but it takes that literally.

    – jim_e_jib Feb 25 '23 at 21:22
  • Ah yes, except my calculate and print the total number files doesn't account for any that are missing as it is just $last - $first... I will try and find out how to count and print the number of files ending .dpx... – jim_e_jib Feb 25 '23 at 23:43
  • counting the number of matching files is easy - they're already in array @files, and getting a count of elements in an array is trivial. counting the number of missing files is only slightly more difficult, just increment a counter variable whenever a missing file is noticed. – cas Feb 26 '23 at 00:06
  • That's awesome cas - thank you ever so much for your help with this! – jim_e_jib Feb 26 '23 at 21:35
  • No problem. BTW, as an alternative to a simple counter variable, if you want a list of missing filenames in an array (for later use in an expanded version of the script - there's nothing you can do in a shell script that you can't do easier in a perl script), declare the variable as an array instead of a scalar variable (my @missing = (); rather than my $missing = 0;) and instead of incrementing it with $missing++ add the missing filename to the array (push @missing, $f;). As with @files, you can still easily get the count by evaluating it in scalar context with scalar @missing. – cas Feb 27 '23 at 01:51