6

My system creates a new text file every time a certain event occurs.
The files should be named file_a.txt file_b.txt file_c.txt etc.

In a Bash shell script,how to find out what filename should be used next?

For instance, if file_a.txt and file_b.txt exist but not file_c.txt, then the next available filename is file_c.txt.

This could be a number if it is easier.
I started designing an algorithm but there is probably an easier way?

Note: Files get removed each day, so the probability of reaching z is zero. So, after z any strategy is acceptable: aa, using integers, or even using UUIDs.

  • 1
    Whats the pattern for the file naming, just the next letter in the alphabet? What happens when it reaches z – 123 Jun 22 '15 at 12:01

5 Answers5

1

Here's a crude way (no error checking) to do it purely in bash:

#helper function to convert a number to the corresponding character
chr() {
  [ "$1" -lt 256 ] || return 1
  printf "\\$(printf '%03o' "$1")"
}

#helper function to convert a character to the corresponding integer
ord() {
  LC_CTYPE=C printf '%d' "'$1"
}

#increment file
fn_incr(){

  #first split the argument into its constituent parts

  local fn prefix letter_and_suffix letter suffix next_letter
  fn=$1
  prefix=${fn%_*}
  letter_and_suffix=${fn#${prefix}_}
  letter=${letter_and_suffix%%.*}
  suffix=${letter_and_suffix#*.}

  #increment the letter part
  next_letter=$(chr $(($(ord "$letter") + 1)))

  #reassemble
  echo "${prefix}_${next_letter}.${suffix}"
}

Example usage:

fn_incr foo_bar_A.min.js
#=> foo_bar_B.min.js

Doing it in-bash with multiple-letter indices would require longer code. You could always do it in a different executable, but then you might want to increment the filenames in batches or else the executable startup overhead might slow down your program unacceptably. It all depends on your use case.

Using plain old integers might be the better choice here as you won't have to manually manage how 9++ overflows to the left.


chr() and ord() have been shamelessly stolen from Bash script to get ASCII values for alphabet

Petr Skocik
  • 28,816
1

If you don't really care, on Linux (more precisely, with GNU coreutils):

tmpfile=$(TMPDIR=. mktemp --backup=numbered)
… # create the content
mv --backup=numbered -- "$tmpfile" file.txt

This uses the GNU backup name scheme: file.txt, file.txt.~1~, file.txt.~2~, …

Another relatively compact way, with numbers that can be placed in a more convenient place, is to take advantage of zsh's glob qualifiers to find the latest file, and calculate the next file with some parameter expansion.

latest=(file_<->.txt(n[-1]))
if ((#latest == 0)); then
  next=file_1.txt
else
  latest=$latest[1]
  next=${${latest%.*}%%<->}$((${${latest%.*}##*[^0-9]}+1)).${latest##*.}
fi
mv -- $tmpfile $next

With any POSIX shell, you'll have an easier time if you use a number with leading zeros. Take care that an integer literal with a leading zero is parsed as octal.

move_to_next () {
  shift $(($#-2))
  case ${1%.*} in
    *\*) mv -- "$2" file_0001.txt;;
    *)
      set -- "${1%.*}" "${1##*.}" "$2"
      set -- "${1%_*}" "$((1${1##*_}+1)).$2" "$3";;
      mv -- "$3" "${1}_${2#1}";;
  esac
}
move_to_next file_[0-9]*.txt "$tmpfile"
0

This outputs the next sequential filename. The ID can be any length and it can be either numeric or alphabetic. This sample is primed to use an alpha ID, the first ID being a

pfix='file_'
sfix='.txt' 
idbase=a        # 1st alpha id when no files exist - use a decimal number for numeric id's 
idpatt='[a-z]'  # alpha glob pattern - use '[0-9]' for numeric id's
shopt -s extglob
idhigh=$( ls -1 "$pfix"+($idpatt)"$sfix" 2>/dev/null |
             awk  'length>=l{ l=length; 
                   id=substr($0,'${#pfix}'+1,length-'${#pfix}-${#sfix}') } 
                   END{ print id }' )
[[ -z $idhigh ]] && echo "$pfix$idbase$sfix" ||
   perl -E '$x="'$idhigh'"; $x++; print "'${pfix}'"."$x"."'${sfix}'\n"'

If no matching file exists, the output is:

file_a.txt

If the highest matching file is file_zzz.txt, the output is:

file_aaaa.txt
Peter.O
  • 32,916
0

Try:

perl -le 'print $ARGV[-1] =~ s/[\da-zA-Z]+(?=\.)/++($i=$&)/er' file*.txt

That will give you file_10.txt after file_9.txt, file_g.txt after file_f.txt, file_aa.txt after file_z.txt, but not file_ab.txt after file_aa.txt or file_11.txt after file_10.txt because the file* shell glob will sort file_z.txt after file_aa.txt and file_9.txt after file_10.txt.

That latter one you can work around with zsh by using file*.txt(n) instead of file*.txt.

Or you can define a numeric sort order in zsh, based on those aa, abc being recognised as numbers in base 36:

b36() REPLY=$((36#${${REPLY:r}#*_}))
perl ... file_*.txt(no+b36)

(note that the order is ...7, 8, 9, a/A, b/B..., z/Z, 10, 11... so you don't want to mix file_123.txt and file_aa.txt).

0

This problem can be solved handily in python using various iterator building blocks available in the itertools module

from os.path import isfile
from string import ascii_lowercase
from itertools import dropwhile, imap, chain, product, repeat, count
next(dropwhile(isfile, imap('file_{}.txt'.format, 
    imap(''.join, chain.from_iterable(
    product(ascii_lowercase, repeat=x) for x in count(1))))))
iruvar
  • 16,725