7

The question is completely rewritten due to things learned from the first two answers and comments that I was unaware of when first asking.


After a photo shoot I come home with files that look like _DSC1234.NEF. NEF is Nikon's camera raw file format, so EXIF data is present in the files.

I would like to automatically rename them with three parts:

  1. Creation date in the format YYYYMMDD
  2. Name of Shoot
  3. Number of Image

So the final file name should look like this

20140707_NameOfShoot_0001.NEF

There are a few issues:

ad 1. Creation date sometimes I can only copy and rename the files a few days after shooting, so the date should reflect the date the picture was taken, not the date it was copied. mtime seems like the best bet here or if possible creation date from EXIF.

ad 2: Name of Shoot This would ideally be a variable I could set as a parameter on calling the script.

ad 3. Number of Image This should reflect the age of the image, the oldest having the lowest number. The problem is that cameras usually restart numbering at 0000 after they hit 9999. So 9995-9999 can potentially be older than 0000-0004. I am looking for a solution that reflects file age and in this special case would rename

  • _DSC0000.NEF -> 20140707_FOO_0004.NEF
  • _DSC0001.NEF -> 20140707_FOO_0005.NEF
  • _DSC0002.NEF -> 20140707_FOO_0006.NEF
  • ...
  • _DSC9997.NEF -> 20140707_FOO_0001.NEF
  • _DSC9998.NEF -> 20140707_FOO_0002.NEF
  • _DSC9999.NEF -> 20140707_FOO_0003.NEF

Again, either mtime or if possible creation date from EXIF seem right.


From here I have a working solution which renames all .NEF-files in a folder by date:

find -name '*.NEF' | 
gawk 'BEGIN{ a=1 }{ printf "mv %s %04d.NEF\n", $0, a++ }' | 
bash 

Hard coding file-modification-date and the shootname works well, but it would be great if it could be automated.

For the timestamp I find articles on strftime(), mktime(), systime() but I don't understand how to use them to return file modification date. I also tried to add DATE=$(date +"%Y%m%d") and add $DATE to the gawk-line, which leads to deletion of all files in current folder (and probably is systime anyway and not change time).

For the variable, I tried

gawk 'BEGIN{ a=1 }{ printf "mv %s $1_%04d.NEF\n", $0, a++ }' | 

and call the script with ./rename FOO, but FOO is ignored in renaming.

jshlke
  • 177
  • I also tried using digikam for batch rename, which works fine but took roughly 1sec/file on my last 1600+ photos, which I find is a bit slow – jshlke Jul 07 '14 at 11:01
  • I tried stat -c %W -- * in my home directory just now and returned only a bunch of zeroes. Also GNU's man find page documents that it can handle -printf strftime sequences for %Access, status %Change and modification %Times but nothing about file creation or birth. While that doesn't come close to decisively proving it can't be done, I think it's a good indicator that it's not a commonly available function. – mikeserv Jul 07 '14 at 11:29
  • hm. what's the difference in between change and modification? That would probably do anyway. If birth is unavailable, can you see what the solution above uses for sorting? Cause I don't really understand what it is doing. – jshlke Jul 07 '14 at 11:38
  • that's status change. Have a look here - I think it's a pretty good indicator too. Maybe there are file systems that report on creation time - I'm not sure - but I don't think many do. – mikeserv Jul 07 '14 at 11:43
  • grand, thanks a lot. and yes, that should do. any idea on how I can put that into my filename? – jshlke Jul 07 '14 at 11:45
  • status change? Do you have GNU find? - cause it can do pretty much all of it. – mikeserv Jul 07 '14 at 11:46
  • No, I'll look into that, thanks. And you mean, it would also replace the code I already have? Or would I have to combine it with gawk? – jshlke Jul 07 '14 at 11:50
  • what about stat? Either one should do if sorted. But, yeah, if you have those I can answer the question without gawk. I assume you have them because gawk should be GNU too. – mikeserv Jul 07 '14 at 11:52
  • I absolutely don't insist on gawk, it's just the closest I found when looking for a solution. – jshlke Jul 07 '14 at 11:54
  • Ok - but do you have them? GNU stat and/or GNU find? They'd make it easy. – mikeserv Jul 07 '14 at 11:55
  • I have findutils and coreutils – is that it? I am on crunchbang by the way. – jshlke Jul 07 '14 at 12:03
  • yeah. That will do it. I've just posted the answer - be sure to test it, though, please, to make sure it does do what you want. – mikeserv Jul 07 '14 at 13:23
  • Beware that the change time is almost certainly not what you want — it's always more recent than the modification time, it's typically the last time you copied the file, or changed its permissions, etc. The modification time is often the closest thing you can get to the creation time (and is always closer than the change time). – Gilles 'SO- stop being evil' Jul 08 '14 at 01:23
  • @Seul - in truth I'm inclined to agree with Gilles. I didn't bother to argue it after providing you a link on the subject providing more and better info than any offered here. Still, if you find that he's correct, please let me know as well. stat can do the same with mod time as it does with change time - but I will have to change the %z in my answer to a %y. – mikeserv Jul 08 '14 at 01:31
  • @mikeserv yes, I think mod-time comes closest to what I need – I probably misunderstood the explanation behind your link and have edited my original question. – jshlke Jul 08 '14 at 09:08

2 Answers2

8

Most unices don't track a file's creation date¹. “Creation date” is ill-defined anyway (does copying a file create a new file?). You can use the file's modification time, which is by a reasonable interpretation the date at which the latest version of the data was created. If you make copies of the file, make sure to retain the modification time (e.g. cp -p or cp -a if you use the cp command, not bare cp).

A few file formats have a field inside the file where the creator application fills in a creation date. This is often the case for photos, where the camera will fill in some Exif data in JPEG or TIFF images, including the creation time. Nikon's NEF image format wraps around TIFF and supports Exif as well.

There are ready-made tools to rename image files containing Exif data to include the creation date in the file name. renaming images to include creation date in name shows two solutions, with exiftool and exiv2.

I don't think either tool lets you include a counter in the file name. You can do your renaming in two passes: first include the date (with as high resolution as possible to retain the order) in the file name, then number the files according to that date part (and chuck away the time). Since modern DSLRs can fire bursts of images (Nikon's D4s shoots at 11fps) it is advisable to retain the original filename as well in the first phase, as otherwise it would potentially lead to several files with the same file name.

exiv2 mv -r %Y%m%d-%H%M%S:basename: *.NEF
# exiv2 uses `strftime(3)`, so `%Y%m%d-%H%M%S` returns YYYYMMDD-hhmmss
# :basename: is a naming variable exiv2's `-r`-handle provides. See `exiv2 -h` for more  
# Now you have files with names like 20140630-235958_DSCC1234.NEF.
# Note that chronological order and lexicographic order agree with this naming format.
i=10000
for x in *.NEF; do
  i=$((i+1))
  mv "$x" "${x%-*}_FOO_${i#1}.NEF"
done

${x%-*} removes the part after the - character. The counter variable i counts from 10000 and is used with the leading 1 digit stripped; this is a trick to get the leading zeroes so that all counter values have the same number.

Rename files by incrementing a number within the filename has other solutions for renaming a bunch of files to include a counter.

If you want to use a file's timestamp rather than Exif data, see Renaming a bunch of files with date modified timestamp at the end of the filename?


As a general note, don't generate shell code and then pipe it into a shell. It's needlessly convoluted. For example, instead of

find -name '*.NEF' | 
gawk 'BEGIN{ a=1 }{ printf "mv %s %04d.NEF\n", $0, a++ }' | 
bash

you can write

find -name '*.NEF' | 
gawk 'BEGIN{ a=1 }{ system(sprintf("mv %s %04d.NEF\n", $0, a++)) }'

Note that both versions could lead to catastrophic results if a file name contained shell special characters (such as spaces, ', $, `, etc.) since the file name is interpreted as shell code. There are ways to turn this into robust code, but this isn't the easiest approach, so I won't pursue that approach.


¹ Note that there is something called the “ctime”, but the c isn't for creation, it's for change. The ctime changes every time anything changes about the file, either in its content or in its metadata (name, permissions, …). The ctime is pretty much the antithesis of a creation time.

  • You make a very general point and then base it on very specific examples - this is very close to a baseless argument. Don't generate shell code then pipe it to a shell is the statement to which I refer. There is another answer here which already does this robustly, but the method for which you have seemingly ignored altogether. – mikeserv Jul 08 '14 at 00:50
  • Thanks for your input. It works pretty well but has two small issues and a larger one. – jshlke Jul 08 '14 at 08:52
  • @Seul What do original file names look like? I thought that NameOfShoot was from the original file name. Regarding the dates, I'm not familiar with the NEF format — does it include a creation stamp inside the file format? For example JPEG images can include the date at which the picture was taken (it's part of the EXIF data), many (but not all) cameras do that. Regarding the sorting, do you want the oldest file to be 0001, the next-oldest to be 0002, etc.? – Gilles 'SO- stop being evil' Jul 08 '14 at 10:32
  • yes, NEF is Nikon's camera Raw format, so all exif si present. the original format is _DSC1234.NEF, will add that to original question, too. – jshlke Jul 08 '14 at 11:13
  • and yes, oldest should get the lowest number. –– on a sidenote: I get asked by stackexchange if I would rather chat than add all this to comments. I am not too familiar with etiquette here, please let me know if I can improve my style. – jshlke Jul 08 '14 at 11:22
  • @Seul I've significantly rewritten my answer, now that I assume that these files contain Exif data, the right solution is to extract that Exif data and there are ready-made tools to rename files based on the creation time recorded in Exif. – Gilles 'SO- stop being evil' Jul 08 '14 at 12:49
  • This works perfectly, thanks a million! And sorry I was not specific enough in my original question – learned a lot along the way, though. – jshlke Jul 08 '14 at 13:19
  • @Gilles - this is really good. I appreciate the rewrite as well. – mikeserv Jul 10 '14 at 09:31
  • @Gilles: I edited your answer to cater for the problematic case when bursts of images have the same time stamp. Now it should be a very robust solution that only breaks if a burst of images with the same time stamp is shot over the 9999-0000 reset, I have no idea how to solve this but I think it is such a rare case that it wight not be important. – jshlke Jul 10 '14 at 14:56
  • @Gilles: one thing I noticed while testing is that I had to change the character that triggers the deletion of the following string in the second part of your script. My original file names already have a _ in place, so the script used the wrong one. I have no better idea than using - but maybe there is a solution that is more generic. – jshlke Jul 10 '14 at 14:58
4
stat --printf='}" "%z_${SN}_${LINENO}"\n' -- * | 
nl -nln -w1 -s '' |
sort -k2,3 | 
sed 's| ..:[^_]*||;s|-||g;s|^|echo mv "${|' |
SN=SHOOTNAME sh -s -- *

The above command should do what you need. It is perhaps more general-purposed than you have requested - but please see the bottom of this answer for a more specific example.

It works by *globbing all the files in the current directory and printing out their change times. For just files in the current directory containing the suffix .NEF you'll want to change the *globstar at the ends of both lines 1 and 5 to *.NEF. It appends some shell variables and quotes to the end - the names for which only exist at the other end of the pipeline in the sh subshell.

Also, because we specify the filenames just by their glob order - or the ${1} shell type parameters this works fine with any filename - whatever weird characters it may contain.

For now the command includes an echo - it's nerfed. Running is essentially a no-op - it just shows you what it wants to do. Here's its output from my home directory before feeding it to sh:

echo mv "${2}" "20140611_${SN}_${LINENO}"
echo mv "${4}" "20140614_${SN}_${LINENO}"
echo mv "${11}" "20140617_${SN}_${LINENO}"
echo mv "${7}" "20140622_${SN}_${LINENO}"
echo mv "${8}" "20140622_${SN}_${LINENO}"
echo mv "${1}" "20140624_${SN}_${LINENO}"
echo mv "${10}" "20140704_${SN}_${LINENO}"
echo mv "${5}" "20140704_${SN}_${LINENO}"
echo mv "${9}" "20140704_${SN}_${LINENO}"
echo mv "${12}" "20140705_${SN}_${LINENO}"
echo mv "${3}" "20140705_${SN}_${LINENO}"
echo mv "${13}" "20140706_${SN}_${LINENO}"
echo mv "${6}" "20140706_${SN}_${LINENO}"

Here it is after sh:

mv Desktop-1 20140611_SHOOTNAME_1
mv Library 20140614_SHOOTNAME_2
mv target.txt 20140617_SHOOTNAME_3
mv script.sh 20140622_SHOOTNAME_4
mv script.sh~ 20140622_SHOOTNAME_5
mv Desktop 20140624_SHOOTNAME_6
mv shot-2014-06-22_17-11-06.jpg 20140704_SHOOTNAME_7
mv Terminology.log 20140704_SHOOTNAME_8
mv shot-2014-06-22_17-10-16.jpg 20140704_SHOOTNAME_9
mv test 20140705_SHOOTNAME_10
mv Downloads 20140705_SHOOTNAME_11
mv test.tar 20140706_SHOOTNAME_12
mv new
file 20140706_SHOOTNAME_13

You might notice that my output shows a few image files already named for their creation time but their new assigned name does not match. This is not an effect of the sort - which works as prescribed - but rather that those files last had a status change on that date. Nevertheless, as you've specified in the comments on this question ctime is the property you're looking for, that is the sort and name property offered here. Still, here is stat's output with filenames attached:

stat -c '%z %n' -- *

2014-06-24 16:50:09.110283839 -0700 Desktop
2014-06-11 23:34:02.981981145 -0700 Desktop-1
2014-07-05 01:00:43.213344635 -0700 Downloads
2014-06-14 10:32:13.537014418 -0700 Library
2014-07-04 23:02:25.079690701 -0700 Terminology.log
2014-07-06 11:24:05.398936386 -0700 new
file
2014-06-22 11:26:53.658004123 -0700 script.sh
2014-06-22 11:26:53.658004123 -0700 script.sh~
2014-07-04 13:34:00.063296353 -0700 shot-2014-06-22_17-10-16.jpg
2014-07-04 13:34:00.066629687 -0700 shot-2014-06-22_17-11-06.jpg
2014-06-17 19:59:38.475358571 -0700 target.txt
2014-07-05 23:53:39.097065292 -0700 test
2014-07-06 00:38:57.060521397 -0700 test.tar

The above output should also help me demonstrate what the whole pipeline is doing.

  • So stat --printf='}" "%z_${SN}_${LINENO}"\n' prints out lines that look like:

    }" "YYYY-MM-DD HH:MM:SS.NS -TZ_${SN}_${LINENO}"

... where YMDHMS.NS -TZ are the various strftime components they represent for ctime. Its output format is identical for %w - time of file birth - %x - time of last access - or %y - time of last modification, and so substituting any one of these for %z in the statement above will expand instead to their values. As we've already discussed in the comments though, %w - file birth time - is not a reliable attribute, and where it is unsupported it expands only to 0.

It does this for every file the shell globs for it in * or whatever shell glob you provide it - such as *.NEF for only files in the current directory with the .NEF suffix.

  • That list is handed to nl which numbers each line incremented by 1. Its line -numbers are ln left-justified and not zero-padded with a minimum -width of 1 and only a '' null-string to -separate them from the contents of the line. It outputs:

    I}" "YYYY-MM-DD HH:MM:SS.NS -TZ_${SN}_${LINENO}"

... where I is the number of each line.

  • sort sorts its input from the -k2,3 second field through the third - or on YYYY-MM-DD HH:MM:SS.NS. Since at this point the only unique quality of any line is either I or the date and I is skipped, there is no need to be any more specific than that. This also addresses the comment you made regarding files named by number and not by date. I should have done the sort before sed in the first place but it didn't occur to me to sort on minutes and seconds and the like.

My test base for this was generated like:

for s in 9 8 7 6 5 4 3 2 1; do touch $s && sleep 1; done

If I sort after sed - as I did before this edit - then this would rename files so that 9 became ${DATE}.SHOOTNAME.9 because each file's YMD fields are identical and sort would not affect their line order. But with this adjustment this command is sort specific to the nanosecond and so 9 is renamed to ${DATE}.SHOOTNAME.1 and vice versa for 1. Thanks, @Seul, for bringing that to my attention.

  • sed then removes the first string that looks like <space><any char><any char>:<colon> and all characters that follow in sequence that are ^not an _underscore. So at this point the line looks like:

    I}" "YYYY-MM-DD_${SN}_${LINENO}"

...next it removes all -dashes. And finally it inserts echo mv "${ at the ^head of each line, so it looks like this:

echo mv "${I}" "YYYYMMDD_${SN}_${LINENO}"
  • Last a shell is invoked with the environment variable $SN declared - here its value is SHOOTNAME. POSIX specifies that the shell increment the var $LINENO for every line it reads in, so for each line we feed it that value in the filename should expand to one more than the last. If - as your comments indicate - for some reason this does not occur, a perfectly valid substitute is $((i=i+1)) as first printed by stat in line 1 of the pipeline and as I've provided in a specific example below.

The shell is also invoked with its positional parameters set to our glob - here the *globstar for all files in the current directory. As already mentioned, *.NEF in this line and in the first would serve to only operate on files in the current directory with the filename suffix of .NEF.

So long as its glob is the same as that in the first line, it will glob them in the same order as nl numbered them. So no matter the line on which it occurs "${1}" will expand to the same filename we've assigned it according to nl's output. This way you can rename the files in date order quickly and safely and in the correct order.

  • As also already mentioned, I have nerfed the command with echo here. But if you run the echo and find it suits you after all you'll need to remove the echo.

Like this:

stat --printf='}" "%z_${SN}_${LINENO}"\n' -- * | 
nl -nln -w1 -s '' |
sort -k2,3 |
sed 's| ..:[^_]*||;s|-||g;s|^|mv "${|' |
SN=SHOOTNAME sh -s -- *

Or maybe:

export SN=SHOOTNAME SUFX=.NEF
stat --printf='}" "%z_${SN}_$((i=i+1))${SUFX}"\n' -- *$SUFX | 
nl -nln -w1 -s '' |
sort -k2,3 |
sed 's| ..:[^_]*||;s|-||g;s|^|mv "${|' |
sh -s -- *$SUFX

And here it is worked into a shell function:

_batch_date_rename () ( # a big one
    ERR= # for error reporting
    export "DIR=$1" "SUFX=$2" \ # args 1,2 must be dirname and file suffix
        "NAME=${3-${ERR:?no rename string specified}}" \ # need name string
        "TIME=${4-%y}" INT=$((${INT:-25}*3)) ${NOCONFIRM+NOCONFIRM=}
        #all above vars are exported to all points below
    _path_chk () { #run once at start - fn quits if any below test fails
        [ -d "$1" ] && [ -w "$1" ] && set -- "$1"/*"$2" && [ -e "$1" ]
    } # chks for user writable dirname and resolvable $1/*$2 glob
    _print_fmt () { #shell printf now not stat - last field zero padded
        printf 'mv "${%d}" "${DIR}/%d_${NAME}_%04d${SUFX}"\n' "$@"
    }
    _print_mv () { #prints copy of mv action before attempting 
        echo '(set -x' #uses shells debug printer to show expanded vals
        printf ': ${0+%s}\n' "$@" \
            ${NOCONFIRM-'Key "ENTER" to accept or "CTRL+C" to quit'}
        echo \) #above can be disabled by declaring NOCONFIRM at invocation
    } #by default fn batches 25 mvs at a time, displays them, and confirms
    _read_loop () { #parses piped in with IFS, batches in INTerval of 25
        argc=${1-$argc} ; ${1+shift} #total globbed files - quit point
        while IFS=' -' read nl y m d na ; do #split on -
            set -- "$@" "$nl" "$y$m$d" "$((i=i+1))" #build array until
            [ "$#" -ge "$INT" ] && break #hit interval
        done ; IFS='
';      set -- $(_print_fmt "$@") && unset IFS #finalize array in _print_fmt
        _print_mv "$@" #do the debug out
        ${NOCONFIRM+:} read < /dev/tty #if $NOCONFIRM not set confirm
        printf '%s\n' "$@" #now print the actual command
        [ $((argc>i)) -eq 1 ] || echo 'exit 0' #check if quit point     
        _read_loop #if not quit repeat
    }                                                                    
    _pipeline () { #this is mostly same - no sed though
        stat -c "$TIME" -- "$@" | nl -nln -w1 -s ' ' | sort -k2,3 | {
            _read_loop $# || echo 'exit 1' #read loop stands in for sed
        } | sh -s -- "$@" #sh still evaluates on args
    } #only two calls from main function below
    _path_chk "$1" "$2" || ${ERR:?Invalid pathname parameters specified}
    _pipeline "$DIR"/*"$SUFX" #if _path_chk do _pipeline
) #that's all folks

That uses the shell to some things I was doing with other utilities. The concept is the same - glob a file list, sort in different ways and store the sort order. What's really different about this thought is that it batches move operations by interval, displays to the user what it is about to to do and awaits a prompt before continuing. I recorded myself using it here so you can watch a terminal session of how it works.

mikeserv
  • 58,310
  • 2
    First of all, thanks a lot for taking the time to come up with this and explain it so well. I really appreciate it. However, it is so much beyond my skill level, that I really don't understand much, even with your profound explanations. when I run the script on a directory of test-files (script being in the same directory), it shows it is going to rename all files to the same filename, it does not show any incremented numbers. when I run the version without echo, all files are gone afterwards and the script itself gets renamed to 20140707_SHOOTNAME_ – jshlke Jul 07 '14 at 14:35
  • @Seul? Why did you run it without echo? Oh, I see, with test files - sorry. It show that because for whatever reason your sh is not supportng $LINENO. Can you try doing echo $LINENO? I don't understand why it would no be supported - you have copied in the quotes exactly, right? They're important. And I can't speak for bash - so I wouldn't recommend using it - but sh should work with this. $LINENO is specified by POSIX. Also - don't put the script in the same directory - else change the * to *.NEF in both the first and last lines. – mikeserv Jul 07 '14 at 20:56
  • @Seul - if $LINENO doesn't work, this probably will - try changing the first line to - stat --printf='}" "%z_${SN}_$((i=i+1)).NEF"\n' -- *.NEF | ... and the last line to ... SN=SHOOTNAME sh -s -- *.NEF – mikeserv Jul 07 '14 at 21:02
  • Thanks for getting back. I tried with sh and that works better. However, three issues with your last piece of code (after "or maybe"): – jshlke Jul 08 '14 at 08:26
  • a) the file extension gets lost – jshlke Jul 08 '14 at 08:26
  • b) the date prefix is indeed wrong. I created the test-files yesterday, but copied them into a new test folder today. They get today's date, so I possibly misunderstood the explanation behind the link you sent yesterday and am not looking for ctime after all. Sorry for the confusion, but as I said, my skill level here is pretty low. I also tried @Gilles' solution and there the date is right (yesterday). – jshlke Jul 08 '14 at 08:31
  • c) the sorting. I created four files: _DSC12345.NEF-_DSC12348.NEF. Then as last file a fifth _DSC0000.NEF. I also modified the last one after creating it. By »sorting by creation date« I hoped that 0000 becomes the file with the highest number after renaming. This is important because my camera's file naming jumps to 0000 after 9999, so potentially I can shoot 10 images with 9995-9999 being before 0000-0004. In your solution, file 0000 becomes 1 where it should be 5. – jshlke Jul 08 '14 at 08:49
  • @Seul - I know about the date - I suspected it as well. As I said in the answer though - you can use the various forms interchangeably with this: - you can try any one of %w-%z and find which you like best. They'll all work the same because the format is identical. The 0000 situation you describe is due to all files having the same date. I can fix that by making the times more specific - as is it chops the NS and etc fields from the output, and so they're only sorted by date - not minutes or seconds. That's very easily handled. – mikeserv Jul 09 '14 at 23:16
  • @Seul - and sorry about the file extension - I added it. I'm trying to think of the best way to do the other thing. Should only be a few minutes. – mikeserv Jul 09 '14 at 23:21