2

I have some log files named :

  • 2016-02-10_03-52.log
  • 2016-02-10_04-43.log
  • 2016-02-10_02-13.log
  • ...
  • 2016-03-15_07-03.log
  • 2016-03-15_09-08.log

Basically the pattern is : YYYY-MM-DD_.log

I would like to create a tar of all files starting with the same pattern like :

  • 2016-02-10.tar
  • ...
  • 2016-03-15.tar

The thing is that I don't know the pattern in advance only its structure.

I don't know how to search for files starting with the same (unknown) pattern .

Any help much appreciated. Thank you

As per "Nominal Animal" solution below :

export LANG=C LC_ALL=C find . -name '[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]*' -printf '%f\n' | sed -e 's|.$||g' | sort | uniq | while read NAME ; do find . -name "${NAME}_" -printf '%p\n' | tar -cJf "${NAME}.xz" -T - --no-unquote done

  • You have to have some limitation to the pattern. say "it starts with 2016,followed by a -, which is followed by a 2 digit month, followed by a 2 digit day of the month, followed by string _log. Otherwise, you can not filter out the files that you want. Otherwise your question is too vague – MelBurslan Apr 01 '16 at 13:36
  • I updated my question above – chris3389 Apr 01 '16 at 13:41

2 Answers2

3

Here is a very simple two-step process to do exactly this.

First, use find to generate the list of all files that should end up archived. Use sed to generate the archive name for each. Filter the output through sort and uniq to ensure you have the names for all archives you need. For example:

find . -name '[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]_*' -printf '%f\n' | sed -e 's|_.*$||g' | sort | uniq

Note that we use %f format above, to get the file names only, not the full paths.

Next, we pipe that through a small bash loop that reads each archive name, using find again to find all log files, piping that list to tar generating the archive.

For running such commands, I like to ensure we are using the C/POSIX locale (no localized error messages or other formatting). That is done by setting LANG and LC_ALL environment variables to C. So, the entire command sequence I'd use is

export LANG=C LC_ALL=C
find . -name '[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]_*' -printf '%f\n' | sed -e 's|_.*$||g' | sort | uniq | while read NAME ; do
    find . -name "${NAME}_*.log" -printf '%p\n' | tar -cJf "${NAME}.tar.xz" -T - --no-unquote
done

The -J parameter in -cJf refers to XZ compression (it is fast and good, you probably do want that); I like to read -cJf as "create XZ archive file". The -T - means files in each archive are supplied from standard input, and --no-unquote means the file names are raw, not quoted.

Note that the pattern of the archive names is very suitable for globbing here. (That is, that we can supply it to find -name ....) If the pattern contained *, ?, [, or ], we'd need to escape them. Doable, but annoying. The OP has chosen the filename pattern extremely well, in my opinion.

  • Just fixed typos and it worked fine. Thank you very much

    export LANG=C LC_ALL=C find . -name '[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]*' -printf '%f\n' | sed -e 's|.$||g' | sort | uniq | while read NAME ; do find . -name "${NAME}_" -printf '%p\n' | tar -cJf "${NAME}.xz" -T - --no-unquote done

    – chris3389 Apr 02 '16 at 08:19
1

Given that tar has an "append" option (-r), you can Keep It Stupid Simple:

for file in *.log; do tar -rf "${file%%_*}.tar" "$file" ; done

You can't include the z option to compress the logs with this particular approach (tar: Cannot update compressed archives) but boy is it simple.

Add robustness to globbing pattern according to your needs, of course. This version assumes that all .log files should be tarred in one archive or another.

Wildcard
  • 36,499