0

I have a script that listens to Twitter and stores tweets with a certain keyword in a JSON file. A new destination file is created every time the script starts.

Sometimes my script crashes and automatically restarts, creating a new JSON file in the process.

I would like to show a running log of the incoming tweets. With a single file I can do this with (piping to jq to show only a single field from the JSON):

tail -f file1.json | jq '.text'

However, once the script has crashed and restarted, a new file is created (e.g. file2.json) and the above command listens to a file which is no longer updated.

To work around this issue I thought I should perhaps concatenate all files in the directory and tail -f | jq '.text' the result.

However, while I can do cat * to concatenate all files currently in the folder, new files seem not to be added automatically to the concatenation.

How can I continuously concatenate all files in a folder, such that I can always see the latest rows of the newest file?

  • The issue seems to be that your Twitter script creates a file with a new name rather than writes to a file with a per-determined name. What about changing the behavior of your Twitter script to instead move the old output file away and then write the output file's old name? – Kusalananda Mar 02 '21 at 16:41
  • You could also do something like script.sh | tee -a filename | jq .... – Eduardo Trápani Mar 02 '21 at 16:51
  • At first glance multitail looks promising (e.g. multitail -iw …). But no. – Kamil Maciorowski Mar 02 '21 at 22:36
  • @Kusalananda, definitely the long term solution is to fix the script and try/catch the exceptions or handle the file switching in there. However, I don't have the time for now to dive into that script (which I got from another person), so I'm looking for a band-aid solution. – Saaru Lindestøkke Mar 03 '21 at 11:11

3 Answers3

1

This solution uses tail -F (i.e. tail --follow=name --retry ) which is not portable. Tested with GNU tail.

Proceed as follows:

  1. Create monitored as a regular file:

    : >>monitored
    
  2. Periodically check if there is a file*.json file newer than monitored. If so, replace monitored with a hardlink to the file:

    while sleep 1; do
        find . \( -name . -o -prune \) -name 'file*.json' -newer monitored -exec ln -f {} monitored \;
    done
    

    The above loop can be run in the background.

  3. Monitor monitored by following its name:

    tail -F monitored
    

    You can pipe to jq '.text' or whatever.

Notes:

  • Your explicit question is:

    How can I continuously concatenate all files in a folder, such that I can always see the latest rows of the newest file?

    My solution does not "continuously concatenate". It allows you to "see the latest rows of the newest file" though.

  • If many file*.json files were created/updated in the same second, then there would be no guarantee each of them appears as monitored if only for a moment, some could be skipped. However I understand your script crashes and automatically restarts occasionally, so it probably takes more then one second from one restart to the next. Then there is no problem.

    In general there may be the problem. I clearly state the solution is not fully reliable when the "newest file" status jumps from file to file too frequently.

0

If you want to watch all files in current directory, use this construction:

ls | cat - <(inotifywait -m --format '%f' -e create . ) | while read file ; do tail -v -f "$file" & done

Note that it does not support having subdirectories in the watched directory.

jiwopene
  • 1,071
  • 4
    Related: https://unix.stackexchange.com/questions/128985/why-not-parse-ls – Kusalananda Mar 02 '21 at 16:54
  • @Kusalananda, +1, but I am using it here for simplicity. I suppose that it will be used only as one-time solution and the user has “safe” environment (no unknown or unchecked file names etc.). – jiwopene Mar 02 '21 at 18:23
0

The answer of Kamil pointed me in the right direction, i.e. the use of hard links and tail -F.

I went with the following solution:

  1. With watch re-create a hard link to the latest file periodically:
watch -n 300 'ln -f $(find . | sort --reverse | head -n1) ./latest.json'
  1. Use tail -F to continuously see the latest line of the hard linked file:
tail -F latest.json | jq '.text'