1

Goal

I was looking for a simple way to check for new files.

As the target system is a minimal embedded Linux platform, I cannot just install more packages.

Current Solution

A nice solution seemed to be using find ... -newer reference.file and then repeatedly run that with touching the reference.file in each run, as suggested here: https://unix.stackexchange.com/a/249238/562136

In my case, the code looks like this:

NEW_FILES=()
WORK_FILE_DIR="/some/folder/path"
REFERENCE_FILE="${WORK_FILE_DIR}/.last_checked_reference"

function find_new_files() { mapfile -t NEW_FILES < <(find "$WORK_FILE_DIR" -type f -newerBa "$REFERENCE_FILE" -name "*.txt")

touch "$REFERENCE_FILE" }

mkdir -p $WORK_FILE_DIR touch $REFERENCE_FILE

while true; do find_new_files

for file in "${NEW_FILES[@]}"; do echo "New file $file" # ... handle file content in multiple steps done

echo "--" sleep 5 done

Note that I actually used -newerBa and not just newer.

Expected behaviour

By using -newerBa only files created after the last access to REFERENCE_FILE should be listed, with the access time being updated on each touch.

I expected the output to look like

--
--
New file <file1>
--
--
--
New file <file2>
...

Actual behaviour

The output looks like this:

--
--
New file <file1>
--
New file <file1>
--
New file <file1>
--
New file <file1>
New file <file2>
--
New file <file1>
New file <file2>
...

BUT when I touch the REFERENCE_FILE externally, meaning from my CLI while the script runs, the touch has the expected effect:

--
--
New file <file1>
--
New file <file1>
-- <-- at this point, touch REFERENCE_FILE from my CLI
--
New file <file2>
--
New file <file2>
...

What I tried

  1. I added stat $REFERENCE_FILE to each iteration and can see that 3 of the 4 times (all but creation date) are updated properly while the script is running.

I checked stat when updating the REFERENCE_FILE from my CLI, and I cannot see any difference.

16777221 96762726 -rw------- 1 user staff 0 0 
"Feb 25 11:53:24 2023"
"Feb 25 11:53:24 2023"
"Feb 25 11:53:24 2023"
"Feb 25 01:10:39 2023" 
4096 0 0 
<path>/.last_checked_reference
--
16777221 96762726 -rw------- 1 user staff 0 0
"Feb 25 11:53:29 2023"
"Feb 25 11:53:29 2023"
"Feb 25 11:53:29 2023"
"Feb 25 01:10:39 2023"
4096 0 0 
<path>/.last_checked_reference
  1. Use touch -a $REFERENCE_FILE and touch -m $REFERENCE_FILE
  2. Adjusted file permissions to 666 or 777 to make sure that 600 is not a problem.

What works

I can completely remove and recreate the REFERENCE_FILE.

rm "$REFERENCE_FILE"
touch "$REFERENCE_FILE"

Question

I do not understand why stat shows updated times and the script does not work as intended, but then reacts as intended to each touch from my CLI.

Why does it behave like this?

  • Is the file system mounted with the noatime option ? – Paul_Pedant Feb 25 '23 at 13:42
  • On the chance that I misunderstood: Wouldn't the stat command then show a "-" instead of an actual time for the last access? – L. Heinrichs Feb 25 '23 at 14:08
  • I can't replicate this on Debian. Apart from the B in -newerBa not being supported and having to use -newerca or -newerma instead, it works fine... AFAIU noatime should only inhibit atime changes from reads and writes, not change the fact that the atime field still exists, nor manual changes to it through touch – ilkkachu Feb 25 '23 at 14:30
  • Though note that there's a race there, a file could be created between the moment find finishes and the script updates the reference file timestamp. It'd not be listed in that iteration and would be older than the reference on the next, and hence missed completely. You could prevent that by creating a new reference with another name before running find, and moving it to place after it, though that might just invert the problem and make some files appear twice. – ilkkachu Feb 25 '23 at 15:30
  • 1
    Another way would be to just move processed files away from the directory. Then you wouldn't need to care about timestamps at all, any files in the directory would be new and due for handling. – ilkkachu Feb 25 '23 at 15:30
  • @ilkkachu that stat output format suggests the OP is using some sort of BSD system. – Stéphane Chazelas Feb 25 '23 at 16:52
  • Are you sure the handle file content in multiple steps part you're not showing doesn't recreate the file? – Stéphane Chazelas Feb 25 '23 at 16:53
  • another way would be to just move processed files away <<

    I thought about that. However, the files are created by another process, which may get duplicate input data. It writes out the data to a file if the file does not exist. So when moving the file, there is a chance would be recreated and again treated as new.

    Are you sure the handle file content in multiple steps part you're not showing doesn't recreate the file? <<

    Yup I also checked that with stat and no changes to the timestamps occur.

    – L. Heinrichs Feb 25 '23 at 16:53
  • 2
    sed -i '' sed-code file for instance doesn't edit the file in place, it replaces it with a modified copy so the birth time would be new. – Stéphane Chazelas Feb 25 '23 at 16:55
  • Oh that is a bit embarassing :D I think this might be it, as I am using exactly this. Guess I removed the "important" part of the code thinking its not important – L. Heinrichs Feb 25 '23 at 18:23
  • @StéphaneChazelas feel free to add an answer to be accepted. – L. Heinrichs Feb 25 '23 at 20:00
  • 1
    @L.Heinrichs The atime is still updated for creates and writes, so all files and directories still have it: the noatime just gives file systems the option to suppress updating it on every read, either per-file or whole fs. Linux Documentation Project has more detail. – Paul_Pedant Feb 26 '23 at 09:19

0 Answers0