11

Am trying to OCR some documents insitu (from a linux command line on a windows share). The process of OCRing is find and I have muddled through using the find command to pipe the files through the loop correctly.

However I need to preserve the original timestamp for modified. I am currently trying to use stat and touch as below:

#!/bin/bash
OLDIFS=$IFS

    IFS=$(echo -en "\n\b")

    for f in `find /mnt/library/Libra/Libra/Ashfords -name "*.pdf"`
         do
        ORIGTS=`stat -c "%Y" $f`
        sudo /opt/ABBYYOCR9/abbyyocr9 -rl English -pi -if $f -f PDFA -paemImageOnText -pafpr original -of $f
        touch -t $ORIGTS $f

    done

    IFS=$OLDIFS

Of course the touch command fails. running the commands separately I notice "stat -c" is something along the lines of this:

1334758696

which is like no date I know. I feel as if I am close but cannot work out how to convert the date I have in to a touch friendly version. Is it some form of seconds from something?

  • Aside: your use of IFS seems unusual. Did you really want to split on backspace (\b)? See http://unix.stackexchange.com/questions/9496/looping-through-files-with-spaces-in-the-names/9499#9499 for some tips. – Mikel Apr 18 '12 at 19:31

5 Answers5

18

stat's output is a Unix timestamp, also called seconds since the Epoch.

All GNU coreutils that accept a date allow you to put a timestamp instead by prefixing the timestamp with an @.

So try this

touch -d @$ORIGTS $f

See coreutils - Seconds since the epoch

Mikel
  • 57,299
  • 15
  • 134
  • 153
  • ah that explains a lot of timestamps i have seen in linux now! Thanks alot –  Apr 18 '12 at 15:38
8

touch can use a file's timestamp using the -r option. You might want to output to a different file (I assume below that -if is input file and -of is output file)

for f in ...; do
    sudo /opt/ABBYYOCR9/abbyyocr9 ... -if $f ... -of $f.new
    touch -r $f $f.new
    mv $f.new $f
done
glenn jackman
  • 85,964
3

IFS=$(echo -en "\n\b")

Since you're assuming a shell with echo -e, and you have bash in your shebang line anyway, you can use IFS=$'\n\b'. Making backspace a separator is rather weird. You don't need IFS for what you're doing anyway.

OLDIFS=$IFS

IFS=$OLDIFS

Note that this restores the old value of IFS only if IFS was initially set. If IFS was initially unset, this sets IFS to the empty string, which is completely different. In ksh, bash or zsh, if you need to set IFS temporarily, you can write your code in a function and make IFS local to this function. In other shells, you need to be careful about the unset case.

`find /mnt/library/Libra/Libra/Ashfords -name "*.pdf"`

Never use command substitution on the output of find.

  • This splits the output at the characters in $IFS. If you set IFS to a newline, then this splits the output at newlines, but you still can't handle file names containing newlines.
  • Not only is the result of command substitution split into words, but then each word is used as a glob pattern. If you files called A[12].pdf, A1.pdf and A2.pdf, you'll end up with A1.pdf A2.pdf A1.pdf A2.pdf. You can turn globbing off with set -f (and back on with set +f), but here (like most of the time) the right way is not to use command substitution.

Use the -exec argument to find (or if your system has -print0, you can use find … -print0 | xargs -0 … instead; this is only useful to act on multiple files at once if you need portability to ancient Linux systems or current OpenBSD systems that have -print0 but not -exec … {} +).

ORIGTS=`stat -c "%Y" $f`
# [transform $f]
touch -t $ORIGTS $f

Note that you're missing double quotes around $f (they aren't needed if these are the results of splitting and you haven't changed IFS since then and globbing is turned off, but really, always put double quotes unless you know why you can't leave them on).

This is clumsy and non-portable (stat doesn't exist on all systems, and its arguments are different across the different systems where it exists). touch has a portable option to set a file to the timestamp of another file: touch -r REFERENCE_FILE FILE. I would recommend one of two approaches instead:

  • If you can, first transform the original file into a new file, then call touch -r to set the date of the new file, and finally move the new file into place. It's better to make sure the output is fine before anything happens to the input; otherwise, if the transformation is interrupted for any reason (e.g. a power failure), you'll lose data.
  • If the transformation is a black box that you have no control over, you can use touch -r twice: once to save the date of the original file on an empty temporary file (which will be automatically created), then again after the transformation to restore the date using the temporary file.

Thus:

find /mnt/library/Libra/Libra/Ashfords -name '*.pdf' \
     -exec sh -c 'transform "$0" to "$0.tmp" && touch -r "$0" "$0.tmp" && mv -f "$0.tmp" "$0"' {} \;
0

For some reason I missed the answer about touch -r; if for some strange reason you neither have GNU coreutils’ stat as in the accepted answer nor can use touch -r, here's how to get the timestamp in touch-friendly format with a BSD-like stat.

% /usr/bin/stat -f '%Sm' johnson                   
Oct 23 22:51:00 2012
% /usr/bin/stat -t '%Y%m%d%H%M.%S' -f '%Sm' johnson
201210232251.00
% touch foo
% touch -t $(/usr/bin/stat -t '%Y%m%d%H%M.%S' -f '%Sm' johnson) foo
% /usr/bin/stat -f '%Sm' foo                    
Oct 23 22:51:00 2012

But really, just use touch -r:

% touch foo
% touch -r johnson foo
% /usr/bin/stat -f '%Sm' foo
Oct 23 22:51:00 2012
0

I had the same problem, coming from the 'moviemaking' proces'.

In the example below orig_file.wav is the file with original timestamp, while processed_file.wav is the file with same contents, but wrong timestamp.

BEFORE:

localhost $ ls -lh orig_file.wav processed_file.wav Jan 23 17:15 processed_file.wav Jul 9 2018 orig_file.wav

THE COMMAND:

localhost $ touch -t $(date --date=@`stat -f%B orig_file.wav` +%Y%m%d%H%M.%S) processed_file.wav

AFTER:

localhost $ ls -lh orig_file.wav processed_file.wav Jul 9 2018 processed_file.wav Jul 9 2018 orig_file.wav

NOTES:

stat in inverted ticks gives you the creation timestamp of the original file as unix epoch time (in seconds). The @ from coreutils converts it to an iso date that date can understand and reformat with YYYYMMDDHHmm.SS so that touch can understand it. I put the date command into $(), as an equivalent of inverted ticks, as they cannot be reused in the same command.

dominikz
  • 101