1

Okay, I think this is possible, but I can't quite figure it out. This is the situation.

A folder contains the log files of all the processes on my robot. The structure looks sort of like this:

$ ls -lrt
total 8
drwxrwxr-x 2 per per 4096 nov  3 12:46 launch01
-rw-rw-r-- 1 per per    0 nov  3 12:47 camera112.log
-rw-rw-r-- 1 per per    0 nov  3 12:47 motors121.log
-rw-rw-r-- 1 per per    0 nov  3 12:47 lidar111.log
drwxrwxr-x 2 per per 4096 nov  3 12:49 launch02
-rw-rw-r-- 1 per per    0 nov  3 12:49 motors122.log
-rw-rw-r-- 1 per per    0 nov  3 12:49 lidar211.log
-rw-rw-r-- 1 per per    0 nov  3 12:49 camera113.log

The files camera112.log, motors121.log and lidar111.log are associated to the logs in folder launch01. I would like to write a script that gets all the files that belong to a specific launch and tar them into one tarball. Since timestamps can change between slightly by files and the numbers in the files are only nearly related, I think the best way to gather all relevant files is to get all files which are below launch01 (inclusive), up to the next directory in the list (exclusive). The number of files can vary, as can the time stamps and names. What is consistent is the folder, then a bunch of files, then the next folder, then files, etc. Ultimately, I would like to get the latest set of logs easily.

Unsure of the approach here. Any ideas how to go about this?

Clarifications:

  • Number of files can vary.
  • The exact timestamp is not reliable (as above, the folder launch01 is different than camera112.log) but relative timestamps work fine. For instance, if I could tar all files from launch01 (inclusive) to launch02 (exclusive) in the list provided by ls -lrt, that works great.
  • Welcome to the site. Please elaborate what you mean by "timestamps can change between slightly by files". Do you mean the timestamps are no reliable means to associate the files belonging together? The sort order of the ls -lrt command uses the timestamps, so if you can't rely on them ... – AdminBee Nov 03 '21 at 13:00
  • 2
    "all files which are below launch01" presumably you mean "all files that are newer than launch01"? Above and below have only visual meaning – Chris Davies Nov 03 '21 at 14:32
  • 1
    Adding to @Theophrastus comment, maybe there's another way of linking the files to the folders that don't rely on something so unreliable as the dates; if you can think such option exists. – schrodingerscatcuriosity Nov 03 '21 at 15:17
  • @roaima - "all files which are below launch01" - I took that to be referring to the visual of the output of ls -lrt... so, below launch01/ and above launch02/ ("up to the next directory in the list") – Greenonline Nov 03 '21 at 15:25
  • @Greenonline oh yes, I completely missed that possibility; I was looking at the set of files shown in the question – Chris Davies Nov 03 '21 at 15:31
  • I have a Perl solution for something very similar to this that I faced earlier in the year, which is a sort of look ahead loop, with a "previous value" variable used to backstep - which I could modify and post, if you aren't looking for a bash only solution. However, it would be better to just rename the files - if you have control over the log writing code - to match the launch<n>, i.e. camera<n>112.log, etc.. Also, how sure are you that a log for launch01 could never appear after launch02? Is it logically impossible, w.r.t. thread timings, or whatever? – Greenonline Nov 04 '21 at 02:32

1 Answers1

1

Splitting the task into chunks, using your input of

drwxrwxr-x 2 per per 4096 nov  3 12:46 launch01
-rw-rw-r-- 1 per per    0 nov  3 12:47 camera112.log
-rw-rw-r-- 1 per per    0 nov  3 12:47 motors121.log
-rw-rw-r-- 1 per per    0 nov  3 12:47 lidar111.log
drwxrwxr-x 2 per per 4096 nov  3 12:49 launch02
-rw-rw-r-- 1 per per    0 nov  3 12:49 motors122.log
-rw-rw-r-- 1 per per    0 nov  3 12:49 lidar211.log
-rw-rw-r-- 1 per per    0 nov  3 12:49 camera113.log

Create the "ordered" list of the filenames only

Use either one of these:

ls -lrt | tr -s ' ' | cut -d' ' -f9
ls -lrt | awk '{print $9}'

gives:

launch01
camera112.log
motors121.log
lidar111.log
launch02
motors122.log
lidar211.log
camera113.log

Farm the list off into sections

Modifying this answer to Split one file into multiple files based on delimiter, create a file called awk_pattern containing the following:

BEGIN{ fn = "part1.txt"; n = 1 }
{
   if (substr($0,1,6) == "launch") {
       close (fn)
       n++
       fn = "part" n ".txt"
   }
   print > fn
}

and then running

ls -lrt | awk '{print $9}' | awk -f awk_pattern

gives the required output:

part1.txt

launch01

and then

part2.txt

launch01
camera112.log
motors121.log
lidar111.log

part3.txt

launch02
motors122.log
lidar211.log
camera113.log

Although the first file (part1.txt) should be discarded as it contains only one line...

rm part1.txt

tar the contents of each part

From 6.3 Reading Names from a File

tar -c -v -z -T part2.txt -f part2.tgz

Looping through the tar files

for part_file in $(ls part*)
do
  tar_file = ${part_file%.*}
#  tar_file = basename ${part_file} .txt
  tar -c -v -z -T ${part_file} -f ${tar_file}.tgz
done

This should give

part1.tgz
part2.tgz
part3.tgz

Again, part1.tgz should be discarded:

rm part1.tgz

Putting it all together

#!/bin/bash

ls -lrt | awk '{print $9}' | awk -f awk_pattern

for part_file in $(ls part) do tar_file = ${part_file%.} tar -c -v -z -T ${part_file} -f ${tar_file}.tgz done

rm part1.txt rm part1.tgz

As just one script (incorporating the awk pattern)

#!/bin/bash

ls -lrt | awk '{print $9}' | awk 'BEGIN{ fn = "part1.txt"; n = 1 } { if (substr($0,1,6) == "launch") { close (fn) n++ fn = "part" n ".txt" } print > fn }'

for part_file in $(ls part) do tar_file = ${part_file%.} tar -c -v -z -T ${part_file} -f ${tar_file}.tgz done

rm part1.txt rm part1.tgz

This (hopefully) should work, although I have only tested the first two steps, i.e. up to the tar part, as I don't have the files to tarball up.


Possible improvements:

  1. Post-processing: Remove the part*.txt files (rm part*.txt)

  2. Post-processing: Remove the log files once tar'd up (rm *.log)

  3. Post-processing: Remove the directories once tar'd up (rm -R -- */)

    See this answer to How do I remove all sub-directories from within a directory?.

  4. Prevent awk from producing the useless part1.txt file

  5. Save the tar files elsewhere (... -f ${tar_path}/${tar_file}.tgz)

  6. Don't use intermediary part*.txt files.

Greenonline
  • 1,851
  • 7
  • 17
  • 23