How to find the next available file suffix (file_a.txt file_b.txt etc)

Question

My system creates a new text file every time a certain event occurs.
The files should be named file_a.txt file_b.txt file_c.txt etc.

In a Bash shell script,how to find out what filename should be used next?

For instance, if file_a.txt and file_b.txt exist but not file_c.txt, then the next available filename is file_c.txt.

This could be a number if it is easier.
I started designing an algorithm but there is probably an easier way?

Note: Files get removed each day, so the probability of reaching z is zero. So, after z any strategy is acceptable: aa, using integers, or even using UUIDs.

Whats the pattern for the file naming, just the next letter in the alphabet? What happens when it reaches z — 123, Jun 22 '15 at 12:01

score 1 · Answer 1 · edited Apr 13 '17 at 12:36

Here's a crude way (no error checking) to do it purely in bash:

#helper function to convert a number to the corresponding character
chr() {
  [ "$1" -lt 256 ] || return 1
  printf "\\$(printf '%03o' "$1")"
}

#helper function to convert a character to the corresponding integer
ord() {
  LC_CTYPE=C printf '%d' "'$1"
}

#increment file
fn_incr(){

  #first split the argument into its constituent parts

  local fn prefix letter_and_suffix letter suffix next_letter
  fn=$1
  prefix=${fn%_*}
  letter_and_suffix=${fn#${prefix}_}
  letter=${letter_and_suffix%%.*}
  suffix=${letter_and_suffix#*.}

  #increment the letter part
  next_letter=$(chr $(($(ord "$letter") + 1)))

  #reassemble
  echo "${prefix}_${next_letter}.${suffix}"
}

Example usage:

fn_incr foo_bar_A.min.js
#=> foo_bar_B.min.js

Doing it in-bash with multiple-letter indices would require longer code. You could always do it in a different executable, but then you might want to increment the filenames in batches or else the executable startup overhead might slow down your program unacceptably. It all depends on your use case.

Using plain old integers might be the better choice here as you won't have to manually manage how 9++ overflows to the left.

chr() and ord() have been shamelessly stolen from Bash script to get ASCII values for alphabet

score 1 · Answer 2 · answered Jun 23 '15 at 00:08

If you don't really care, on Linux (more precisely, with GNU coreutils):

tmpfile=$(TMPDIR=. mktemp --backup=numbered)
… # create the content
mv --backup=numbered -- "$tmpfile" file.txt

This uses the GNU backup name scheme: file.txt, file.txt.~1~, file.txt.~2~, …

Another relatively compact way, with numbers that can be placed in a more convenient place, is to take advantage of zsh's glob qualifiers to find the latest file, and calculate the next file with some parameter expansion.

latest=(file_<->.txt(n[-1]))
if ((#latest == 0)); then
  next=file_1.txt
else
  latest=$latest[1]
  next=${${latest%.*}%%<->}$((${${latest%.*}##*[^0-9]}+1)).${latest##*.}
fi
mv -- $tmpfile $next

With any POSIX shell, you'll have an easier time if you use a number with leading zeros. Take care that an integer literal with a leading zero is parsed as octal.

move_to_next () {
  shift $(($#-2))
  case ${1%.*} in
    *\*) mv -- "$2" file_0001.txt;;
    *)
      set -- "${1%.*}" "${1##*.}" "$2"
      set -- "${1%_*}" "$((1${1##*_}+1)).$2" "$3";;
      mv -- "$3" "${1}_${2#1}";;
  esac
}
move_to_next file_[0-9]*.txt "$tmpfile"

Peter.O · Answer 3 · 2015-06-23T11:44:56.927

This outputs the next sequential filename. The ID can be any length and it can be either numeric or alphabetic. This sample is primed to use an alpha ID, the first ID being a

pfix='file_'
sfix='.txt' 
idbase=a        # 1st alpha id when no files exist - use a decimal number for numeric id's 
idpatt='[a-z]'  # alpha glob pattern - use '[0-9]' for numeric id's
shopt -s extglob
idhigh=$( ls -1 "$pfix"+($idpatt)"$sfix" 2>/dev/null |
             awk  'length>=l{ l=length; 
                   id=substr($0,'${#pfix}'+1,length-'${#pfix}-${#sfix}') } 
                   END{ print id }' )
[[ -z $idhigh ]] && echo "$pfix$idbase$sfix" ||
   perl -E '$x="'$idhigh'"; $x++; print "'${pfix}'"."$x"."'${sfix}'\n"'

If no matching file exists, the output is:

file_a.txt

If the highest matching file is file_zzz.txt, the output is:

file_aaaa.txt

Stéphane Chazelas · Answer 4 · 2015-06-22T14:53:45.593

0

Try:

perl -le 'print $ARGV[-1] =~ s/[\da-zA-Z]+(?=\.)/++($i=$&)/er' file*.txt

That will give you file_10.txt after file_9.txt, file_g.txt after file_f.txt, file_aa.txt after file_z.txt, but not file_ab.txt after file_aa.txt or file_11.txt after file_10.txt because the file* shell glob will sort file_z.txt after file_aa.txt and file_9.txt after file_10.txt.

That latter one you can work around with zsh by using file*.txt(n) instead of file*.txt.

Or you can define a numeric sort order in zsh, based on those aa, abc being recognised as numbers in base 36:

b36() REPLY=$((36#${${REPLY:r}#*_}))
perl ... file_*.txt(no+b36)

(note that the order is ...7, 8, 9, a/A, b/B..., z/Z, 10, 11... so you don't want to mix file_123.txt and file_aa.txt).

edited Jun 22 '15 at 14:53

answered Jun 22 '15 at 14:32

Stéphane Chazelas

544,893

The perl on-liner looks great! It does not seem to work for the first file0.txt though? It creates file*.txt. – Nicolas Raoul Jun 23 '15 at 08:56
@NicolasRaoul, with a proper shell (zsh, Thomson shell, csh, tcsh, fish, bash -o failglob), that would rather give you a No match error. – Stéphane Chazelas Jun 23 '15 at 09:38

score 0 · Answer 5 · answered Jun 23 '15 at 20:29

This problem can be solved handily in python using various iterator building blocks available in the itertools module

from os.path import isfile
from string import ascii_lowercase
from itertools import dropwhile, imap, chain, product, repeat, count
next(dropwhile(isfile, imap('file_{}.txt'.format, 
    imap(''.join, chain.from_iterable(
    product(ascii_lowercase, repeat=x) for x in count(1))))))

How to find the next available file suffix (file_a.txt file_b.txt etc)

5 Answers5