18

I've got a long-running command that generates a lot of output on stdout. I'd like to be able to retain, for instance, only the last three days or the last gibibyte (avoiding cutting lines in the middle), and, if possible, in file chunks not larger than 20 MiB. Each file chunk is being named with a numeric suffix or a timestamp.

Something like:

my-cmd | magic-command --output-file-template=my-cmd-%t \
                       --keep-bytes=1G \
                       --keep-time=3d \
                       --max-chunk-size=20M \
                       --compress=xz

Would write:

my-cmd-2014-09-05T10:04:23Z

When it's reaching 20M, it would compress it and open a new one, and so on, and after a while would it starts deleting the oldest files.

Does such a command exist?

I'm aware of logrotate and its ability to manage files written by other applications, but I'm looking for something simpler that doesn't involve having to set up a cron job, specify rules, suspend the process, etc.

6 Answers6

6

You can get some of what you want via pipelog, which "allows for rotating or clearing the log of a running process by piping it through an intermediate which responds to external signals", e.g.:

spewstuff | pipelog spew.log -p /tmp/spewpipe.pid -x "gzip spew.log.1"

You can then get the pid from /tmp/spewpipe.pid, and:

kill -s USR1 $(</tmp/spewpipe.pid)

But that you would have to set up with cron or something. There's one catch to this, however. Notice I gzip spew.log.1 -- this is because the -x command is executed after the log is rotated. So you have the further problem of overwriting spew.log.1.gz each time unless you write a short script to do the gzip and move the file afterward, and use that as the -x command.

Full disclosure: I wrote this, so it of course works perfectly. ;) I will keep a compress option in mind, or something that better facilitates it, for version 0.2 (the intended purpose of -x is somewhat different, but it will work as above). Also automated rollover is a good idea...the first version is intentionally minimal as I resisted the temptation to add features that weren't necessary (it is not so hard to set up a cron job for this, after all).

Note that it's intended for text output; if there are potential null bytes, you should use -z -- which replaces the zero with something else. This was a tradeoff to simplify the implementation.

goldilocks
  • 87,661
  • 30
  • 204
  • 262
5

Dan Bernstein's multilog can apparently do this - or perhaps most of it, while providing an outlet via file descriptors to !processor to make up the difference as you like - though the 20M/1G size specifications may take some finagling as it seems 16M is its outside limit per log. What follows is, in the majority, a copy+paste selection from the link above, though the link also details other options such as timestamping per line, maintaining [an]other file[s] containing only the most recent line matching pattern and more.

Interface

 multilog script

...script consists of any number of arguments. Each argument specifies one action. The actions are carried out in order for each line of input.

Selecting lines

Each line is initially selected. The action...

-pattern

...deselects the line if pattern matches the line. The action...

+pattern

selects the line if pattern matches the line.

...pattern is a string of stars and non-stars. It matches any concatenation of strings matched by all the stars and non-stars in the same order. A non-star matches itself. A star before the end of pattern matches any string that does not include the next character in pattern. A star at the end of pattern matches any string.

Automatically rotated logs

If dir starts with a dot or slash then the action...

 dir

...appends each selected line to a log named dir. If dir does not exist, multilog creates it.

The log format is as follows:

  1. dir is a directory containing some number of old log files, a log file named current, and other files for multilog to keep track of its actions.

  2. Each old log file has a name beginning with @, continuing with a precise timestamp showing when the file was finished, and ending with one of the following codes:

    • .s: This file is completely processed and safely written to disk.
    • .u: This file was being created at the moment of an outage. It may have been truncated. It has not been processed.

The action...

 ssize

...sets the maximum file size for subsequent dir actions. multilog will decide that current is big enough if current has size bytes. (multilog will also decide that current is big enough if it sees a newline within 2000 bytes of the maximum file size; it tries to finish log files at line boundaries.) size must be between 4096 and 16777215. The default maximum file size is 99999.

In versions 0.75 and above: If multilog receives an ALRM signal, it immediately decides that current is big enough, if current is nonempty.

(Note: I suspect the zsh schedule builtin could be easily persuaded to send an ALRM at specified intervals if necessary.)

The action...

 nnum

...sets the number of log files for subsequent dir actions. After renaming current, if multilog sees num or more old log files, it removes the old log file with the smallest timestamp. num must be at least 2. The default number of log files is 10.

The action...

 !processor

...sets a processor for subsequent dir actions. multilog will feed current through processor and save the output as an old log file instead of current. multilog will also save any output that processor writes to descriptor 5, and make that output readable on descriptor 4 when it runs processor on the next log file. For reliability, processor must exit nonzero if it has any trouble creating its output; multilog will then run it again. Note that running processor may block any program feeding input to multilog.

mikeserv
  • 58,310
3

Here is a hacked-up python script to do something like what you are requesting:

#!/bin/sh
''':'
exec python "$0" "$@"
'''

KEEP = 10
MAX_SIZE = 1024 # bytes
LOG_BASE_NAME = 'log'

from sys import stdin
from subprocess import call

log_num = 0
log_size = 0
log_name = LOG_BASE_NAME + '.' + str(log_num)
log_fh = open(log_name, 'w', 1)

while True:
        line = stdin.readline()
        if len(line) == 0:
                log_fh.close()
                call(['gzip', '-f', log_name])
                break
        log_fh.write(line)
        log_size += len(line)
        if log_size >= MAX_SIZE:
                log_fh.close()
                call(['gzip', '-f', log_name])
                if log_num < KEEP:
                        log_num += 1
                else:
                        log_num = 0
                log_size = 0
                log_name = LOG_BASE_NAME + '.' + str(log_num)
                log_fh = open(log_name, 'w', 1)
Mark Wagner
  • 1,911
  • 1
    Is there a reason to have it as a shell script that execs python as the first thing instead of using the python or env python hashbang? – peterph Sep 29 '14 at 19:51
2

The best I could find so far as an approximation that doesn't involve writing huge pieces of code is this zsh code:

autoload zmv
mycmd |
  while head -c20M > mycmd.log && [ -s mycmd.log ]; do
    zmv -f '(mycmd.log)(|.(<->))(|.gz)(#qnOn)' '$1.$(($3+1))$4'
    {rm -f mycmd.log.1 mycmd.log.50.gz; (gzip&) > mycmd.log.1.gz} < mycmd.log.1
  done

Here splitting and rotating into at most 51 20MiB large files.

1

In Linux with systemd if you have root privileges you can start a dedicated journald instance. It requires LogNamespace= in a unit file

/etc/systemd/journald@myprogram.conf:

[Journal]
Storage=volatile
RuntimeMaxUse=20M

/etc/systemd/system/myprogram.service:

[Unit]
Description=My Test Service
[Service]
ExecStart=/home/myuser/myprogram
LogNamespace=myprogram
User=myuser

_

systemctl daemon-reload
systemctl start myprogram
journalctl --namespace=myprogram --follow

See https://www.freedesktop.org/software/systemd/man/systemd-journald.service.html#Journal%20Namespaces

basin
  • 2,051
0

Since it does not really exist I wrote a basic fuse driver that does just that:

stackoverflow psot ring-buffer-log-file-on-unix

Repository of circFS

Mount the circFS fuse driver:

circfs storage_dir mountpoint max_size [fuse options]

And write your logs to mountpoint/filename

It keeps storage_dir/filename at the max_size by writing in a circular manner in this file. When the log is closed, the circular file is rewritten in order, but you can always look a it when the driver runs, you need to compute the start with the apparent size of the log and max_size.

mtk
  • 27,530
  • 35
  • 94
  • 130
Bylon
  • 11