5

Input would be file name like following :

A-B-000001-C
A-B-000002-C
.....
.....
A-B-999999-C

All file should be sequential. I want to find missing sequential file names. For this I'm separating 6 digit sequence number using awk and using grep with regular expression to check if the file is present in the directory.

`ls|grep "A-B-${sequencenumber}-.*"|wc -l`

but shell script is not treating number as decimal and if I force the number to be treated as decimal using 10#$sequencenumber then its removing preceding zeroes which is necessary for searching the file.

Is there any way around this?

5 Answers5

5

Some shells such as bash in their arithmetic expressions treat numbers with leading zeros as octal. A trick to work around this is to manipulate only numbers with an extra nonzero leading digit, e.g. count from 1000001 to 1999999. To get the desired number with leading zeros, strip the leading 1 with a string operation.

n=1000001
while [ "$n" -le 1999999 ]; do
  digits=${n#1}
  set "A-B-$digits-."*
  if [ -e "$1" ] || [ -L "$1" ]; then
    echo "${digits}: $#"
  fi
  n=$((n+1))
done

This method is portable to all POSIX shells and avoids creating a subprocess for the computations, which can make it faster (but a million iterations is likely to be slow anyway, shells aren't the best at performance).

In the script above, instead of the complex and slow command involving ls and wc to count matching files, I use shell built-in constructs: set "A-B-$digits-."* sets the positional parameters to the list of matching files, and the following line prints the number of matches ($#) if there is at least one match (if there's no match then the pattern remains unchanged and so [ -e "$1" ] is [ -e "A-B-$digits-.*" ] which is false).

2

You can use the printf utility for padding, e.g.,

$(ls|grep "A-B-$(printf '%06d' $sequencenumber)-.*"|wc -l)

Two points:

  • while both `foo` and $(foo) are standard, nested backtics tend to not be as portable
  • the printf utility is useful for more than just padding.
Thomas Dickey
  • 76,765
2

Using wc -l will not produce the right result if your files name contain newline.

With bash and zsh, you can use brace expansion:

for n in {000001..999999}; do
  f=A-B-$n-C
  [ -f "$f" ] || printf '%s missing\n' "$f"
done

ksh93 with braceexpand option enable:

for n in {1..999999%06d}; do
  : the code above
done

In ksh and zsh, you can do:

typeset -Z6 i=1
max=999999
while [ "$i" -le "$max" ]; do
  f=A-B-$i-C
  [ -f "$f" ] || printf '%s missing\n' "$f"
  : "$((i+=1))"
done

POSIXly:

min=1
max=999999
while [ "$min" -le "$max" ]; do
  f=$(printf "A-B-%0${#max}d-C" "$min")
  [ -f "$f" ] || printf '%s missing\n' "$f"
  : "$((min+=1))"
done
cuonglm
  • 153,898
  • ksh and zsh also have typeset -Z6 i. – Stéphane Chazelas Jan 10 '16 at 12:17
  • @StéphaneChazelas: Thank for the information. Added it to the answer. – cuonglm Jan 10 '16 at 13:19
  • filenames containing newline are already wrong in the grep before you get to the wc -- but you can eliminate |wc -l by grep -c. With GNU you could do find . -maxdepth 1 -print0 | grep -zc. But I concur [ -f is just better if the C is fixed which Q doesn't promise; Gilles has a more general method for this. – dave_thompson_085 Jan 26 '24 at 03:35
1

With zsh, you can also do:

expected=( A-B-{000001..999999}-C )
  actual=( A-B-<->-C(N)    )
 missing=( ${expected:|actual}    )
if (( $#missing )) print -rlu2 - "$#missing files are missing:" ' - '$^missing

You can also list the unexpected ones such as those where the number is not within the 1..999999 range or expressed with a number of digits other than 6:

unexpected=( ${actual:|expected} )
if (( $#unexpected )) print -rlu2 - "$#unexpected unexpected files:" ' - '$^unexpected
0

Works with awk for me:

set x=`echo $< | awk '{printf "%d",$1}' ` ; echo $x
033      # this is what I entered
33       # this prints out
Bayram
  • 1