"tail -f", but on a file which gets rewritten (downloaded) again and again without outputting then content over and over again?

Question

I've got log files which get downloaded via cron job. If a file is updated on the remote location, the local copy gets rewritten from the beginning even if only data has been appended.

Tools like tail -f or since seem to see this as "the file has been replaced" and start outputting them from the beginning again. i.e. repeat all already known content. Especially those two tools explicitly mention this on STDERR.

So if I call this in one terminal:

for j in $(seq 2 4) ; do for i in $(seq 1 $j) ; do echo $i ; sleep 1; done > /tmp/foo; done

I get these warnings with both, tail -f /tmp/foo and tail -F /tmp/foo in another terminal:

1
2
tail: /tmp/foo: file truncated
1
2
3
tail: /tmp/foo: file truncated
1
2
3
4

And with while sleep 0.25; do since /tmp/foo; done, I get one of these error messages:

1
2
since: considering /tmp/foo to be truncated, displaying from start
1
2
3
since: considering /tmp/foo to be truncated, displaying from start
1
2
3
4

since explicitly uses the inode and not the file name as key for a file. tail probably does something similar to recognise truncated files.

Another "tool" I tried is the Perl library File::Tail, but it has the same "issue", just without warnings (at least with default settings).

So I wonder: Is a way or tool which does not look at the inode but just at the contents of the file and only restarts if data has not just been appended?

What I would like to have is just this output:

(I've seen tail -f, but when the file is deleted and re-created (not appended), but it does not help as it still restarts outputting the file's content from the beginning as well.)

And yes, I'm aware that such a tool either needs a cache of all seen (but not truncated) data or at least hashsums of it.

You can do this on a timer (not continuously). You do not need all the data, or hashsums, just a variable containing the number of lines N you have already seen. Then tail -n +$(( N + 1)) file into a variable, wc -l the new lines, add that to N, and write the new lines to your output stream. Then sleep 10 (or some compromise between acceptable delay and acceptable workload) and repeat. You need some strategy to reset N=0 if the file is actually truncated at the server. — Paul_Pedant, Dec 08 '22 at 17:13
If since uses the inode as the key, than don't change the inode. When you update the file, instead of copying it directly to the same file, copy it first to a temporary file, and then copy it back to the original file. It's inode will not be changed. In that case the inode won't change and since will work. — aviro, Dec 08 '22 at 18:05
@aviro: Yeah, I know. Unfortunately that downloading tool is a 3rd-party tool. (It's all about importing cloud logs into a syslog.) I already thought about making a feature request against since since (sic!) this is what I'd prefer here. But I need something in short term, too. — Axel Beckert, Dec 12 '22 at 07:20
So after downloading it, copy it to another location where the inode would remain the same... — aviro, Dec 12 '22 at 07:48
@Paul_Pedant: Actually an a bit more elaborate version of your idea is what I implemented for now. If you want, you can repost that comment as answer and I'd happily mark it as solution. Otherwise I will probably post my shell script based on that idea in a few days or so as solution. — Axel Beckert, Dec 12 '22 at 16:26
@Alex Your implementation would be more helpful to others than my outline, so I would be glad if you posted your solution. — Paul_Pedant, Dec 12 '22 at 22:57

Stéphane Chazelas · Answer 1 · 2022-12-12T17:38:17.093

1

That's tail trying to be too smart for your use case. Here, you could do:

{
  tail
  while cat; do
    sleep 1
  done
} < /tmp/foo

Same as what the original implementation of tail did (when -f was added in SysIII in 1980).

Better with a shell where cat and sleep are builtin or where there are equivalent builtin commands. Or do it in perl/python...

For instance, in ksh93, sleep is builtin by default and the cat builtin can be enabled with builtin cat. In zsh, you'd use sysread in a loop instead of cat and zselect in place of sleep:

zmodload zsh/zselect
zmodload zsh/system
readall() while sysread -s 65536 -o1; do continue; done
{
  tail
  while readall; do
    zselect -t 100
  done
} < /tmp/foo

If a new file was being created each time, tail -f would not detect it. Still if that was the case, you could still do it by doing something like:

#! /bin/zsh -
zmodload zsh/zselect
zmodload zsh/system
readall() while sysread -s 65536 -o1; do continue; done
file=${1-/tmp/foo}
{
  tail
  (( offset = systell(0) ))
} < $file || exit
while true; do
  if
    sysopen -ru0 -- $file 2> /dev/null &&
      sysseek $offset &&
      readall
  then
    (( offset = systell(0) ))
  fi
  zselect -t 100
done

Where we reopen the file and seek to the last known offset at each iteration.

edited Dec 12 '22 at 17:38

answered Dec 08 '22 at 17:18

Stéphane Chazelas

544,893

Is that meant to be a shell function? – Axel Beckert Dec 08 '22 at 17:21
1

@AxelBeckert, that's meant to be shell code to be used in place of tail -f /tmp/foo. – Stéphane Chazelas Dec 08 '22 at 17:22
Will try, thanks! – Axel Beckert Dec 08 '22 at 17:25
Since @AxelBeckert the OP said the inode of the file might change, I don't think this would work for him, since the file descriptor for the files is kept open, and will remain open even if it's deleted and replaced with new inode. – aviro Dec 12 '22 at 07:47
@aviro, I don't see anything in the OP's question that says the inode might change. It doesn't in the example they've given. If it may change, you could always record the current position and do a open+seek+cat at each iteration. ksh93 and zsh (which are also some the shells that can have cat/sleep or equivalent builtin) have built in support for seeking. Or you could use perl. – Stéphane Chazelas Dec 12 '22 at 08:12
"Is a way or tool which does not look at the inode but just at the contents of the file ..." – aviro Dec 12 '22 at 08:14
@aviro GNU tail looks at the inode structure, especially the st_size field to determine that the file has been truncated as an extension of what the original tail did. My answer does not do that, and does the same as the SysIII tail did. I don't think you can read into that OP's sentence that the inode may change (or more precisely that the directory entry could refer to different files) – Stéphane Chazelas Dec 12 '22 at 08:21
@StéphaneChazelas: Actually I'm not 100% sure if really does, but the behaviours of since, tail -F and tail -f suggested it to me. My examples with > resulted in the same error messages, so I assumed they change the inode as well. The tool which actually caused the issue is Microsoft's azcopy which has to be used to download Azure log files. – Axel Beckert Dec 12 '22 at 10:14
1

@AxelBeckert it's easy to check if the inode changes, by running stat or ls -li on the file before and after. – aviro Dec 12 '22 at 13:01
@aviro, if the inode changed, tail -f would not detect it and would carry on tailing the (possibly renamed or deleted) original file. – Stéphane Chazelas Dec 12 '22 at 17:39

"tail -f", but on a file which gets rewritten (downloaded) again and again without outputting then content over and over again?

1 Answers1