5

I ran a script which acts on multiple "people", and creates output and error files for each. let's say something like this:

output_alice.txt
error_alice.txt
output_bob.txt
error_bob.txt
.
.
.

I want a command that will scan all the error files (error_<name>.txt) and echo the ones that have had something written to them (vs being empty), as a quick way to identify which "people" the script exited with an error for. Is there an easy way to do this? I know how to use grep to do this for a string, e.g. grep -r <substring> ., but not how to check if there is anything at all.

abra
  • 65

5 Answers5

15

Note that bash is not a terminal, it's one of many shells, which are interpreters for some kinds of programming languages specialised in running commands. Like most applications it can work with its input/output connected to a terminal device or any other type of file.

To list the files named error_anything.txt in the current working directory that contain at least one line, in the language of bash and most other Unix shells, you can do:

grep -l '^' error_*.txt

Where ^ is a regular expression that matches at the start of the subject, subject being each line in the file for grep.

For those with at least one non-empty text line:

grep -l . error_*.txt

Where . matches any single character. Beware for files encoded in a charmap other than that of the locale, it could fail to match non-empty lines if their contents cannot be decoded as text.

Also note that not all grep implementations will report files with only one unterminated line (one missing the line delimiter like in the output of printf invalid-text-as-missing-the-last-newline).

Another approach is to look for files that contain at least one byte:

find -L . ! -name . -prune -name 'error_*.txt' -type f -size +0c

Which also has the benefit of ignoring files that are not of type regular (such as directories, sockets...)

Or with the zsh shell:

print -rC1 -- error_*.txt(N-.L+0)

Where - acts like -L so that for symlinks, the size and type of their targets be considered, . is the equivalent of -type f and L+0 of -size +0c (and N for Nullglob so as not to report an error if there's no matching file).

That has the benefit of not including the ./ prefix and of working even if the user name cannot be decoded as text in the locale and of giving you a (lexically by default) sorted list.

That one you can extend to only print the user name (the part of the root name of the file after the first _) with:

{}{ print -rC1 -- ${@#*_}; } error_*.txt(N-.L+0:r)

To list error files that have been modified since you ran a command, you can use the -newer predicate of find and compare with a file that have been touched just before running your command:

touch .before
my-command-that-may-write-to-error-files
find -L . ! -name . -prune -name 'error_*.txt' -type f -size +0c -newer .before

In zsh, you can replace the find command with:

print -rC1 -- error_*.txt(N-.L+0e['[[ $REPLY -nt .before ]]'])

With some find implementations, you can replace ! -name . -prune with -mindepth 1 -maxdepth 1 though -maxdepth 1 would also work here as the file at depth 0 (.) doesn't match the other criteria (it matches neither -name 'error_*.txt' nor -type f) anyway.

With the GNU implementation of date and find (that's also the find implementation that introduced the -maxdepth predicate), you can avoid having to create that .before file by doing:

before=$(date +'@%s.%N')
my-command-that-may-write-to-error-files
find -L . -maxdepth 1 -name 'error_*.txt' -type f -size +0c -newermt "$before"

With zsh, you can replace the before=$(date +'@%s.%N') with print -Pv before '@%D{%s.%N}' or before=${(%):-@%{%s.%N}D} or before=@$EPOCHREALTIME (after zmodload zsh/datetime); you could again avoid the call to find by using glob qualifiers, and even the temporary variable by using an anonymous function again, but that becomes significantly involved:

zmodload zsh/stat
zmodload zsh/datetime
() {
  my-command-that-may-write-to-error-files
  print -rC1 error_*.txt(N-.L+0e['
    stat -F %s.%N -A2 +mtime -- $REPLY && (( $2 > $1 )) '])
} $EPOCHREALTIME

Beware though that on Linux at least, even though the system and filesystems support nanosecond precision, granularity is much less. You can even find that modification time is set upon modification to some value that predates the initial call to date or reference to $EPOCHREALTIME so those approaches may not work for commands that take less than a centisecond to run. Dropping Nanoseconds and replacing > with >= or -newer with ! -older (if your find implementation supports it which is unlikely) may be a better approach.

  • thank you! In your answer you mention "Note that some grep implementations will also report files that have a non-line." What is a non-line? – abra Aug 11 '23 at 12:21
  • @abra see edit. Lines are sequences of 0 or more characters delimited by a newline characters and that don't have a length greater than the LINE_MAX limit in byte. So bytes that don't form characters, or overlong lines, or the bytes if any after the last newline character in the file would form non-lines. – Stéphane Chazelas Aug 11 '23 at 12:26
  • thanks for the clarification. I marked your answer as accepted. The command "grep -l . error_*.txt" is what I needed. Thanks again! – abra Aug 11 '23 at 12:28
  • What a wonderful tutorial for what could have been a brief answer. A lot of fixes for potential issues to watch out for. – Mark Stewart Aug 16 '23 at 03:57
10

GNU find offers the non-POSIX option to list empty files, simply negate that test:

find /path/to/dir -type f -name 'error_*.txt' ! -empty

For not searching in subdirectories add -maxdepth 1 after the path.

In POSIX find checking for a file size of 0 would work:

find /path/to/dir -type f -name 'error_*.txt' ! -size 0
FelixJN
  • 13,566
3

Just grep for ., which means any character. Empty files have no characters, so searching for . will show non-empty files. For example:

$ touch empty1 empty2 empty3
$ echo "not empty!" > non_empty
$ ls -l 
total 4
-rw-r--r-- 1 terdon terdon  0 Aug 11 13:13 empty1
-rw-r--r-- 1 terdon terdon  0 Aug 11 13:13 empty2
-rw-r--r-- 1 terdon terdon  0 Aug 11 13:13 empty3
-rw-r--r-- 1 terdon terdon 11 Aug 11 13:13 non_empty

Now, we grep:

$ grep -- . ./*
non_empty:not empty!

And, to get names only:

$ grep -l -- . ./*
non_empty

Note that grep . will not find a file that has nothing empty line(s) (one or more \n characters). For that, you should use grep '^' as suggested in Stéphane's answer.

terdon
  • 242,166
1

Another one-line method for searching for non-empty files.

$ for f in `ls error_*.txt`; do [ -s "${f}" ] && echo ${f} ; done

Where

-s FILE , True if file exists and is not empty.

Explanation

Loop through list of all files in current directory, that matches error_*.txt . Function checks "-s" if file exists and contains something, if so then display the file name.

Iain4D
  • 11
  • 4
    You should never do for `f in `ls error_*.txt`. Just use for f in error_*.txt directly. – muru Aug 12 '23 at 04:21
  • 1
  • @muru So I'm curious, what is the background on why no quote character ? – Iain4D Aug 12 '23 at 06:56
  • 1
    @Iain4D see the last link from Stéphane's comment above. – muru Aug 12 '23 at 08:41
  • @muru Hmm.. thanks for the link reference. However looks like it is for double-quotes. One of the comments there linked to the always use quotes so I wonder if that implies more applicable to for loops ? – Iain4D Aug 16 '23 at 21:35
  • @StéphaneChazelas Thanks for the links and background on printf vs echo preferences. I read through the explanation and it makes total sense. I got it. Then got to the end of one of the comments . . . and then it mentioned `for simple basic cases where $var is known text , echo ${var} would suffice for your needs". (facepalm) – Iain4D Aug 17 '23 at 21:49
  • After some testing with bash on CentOS Stream 9 (RH based) and Ubuntu 22.04 versions of bash. I also discovered that echo "-n" will result in parameter of no new-line character. However, extra space added (before or after) prints as expected, echo " -n" or `echo "-n " .

    On the other hand, echo "\t" is treated as text unless you make it echo -e "\t"then special characters are processed and in this case becomes a TAB character.

    – Iain4D Aug 17 '23 at 21:52
1

GNU sed only. Like an alternative for the grep command:

sed -sn 1F error_*.txt

! I have not met the command F in the man pages, but it works. In particular, I insert file names in the first line of nonempty files sed -i 1F *

nezabudka
  • 2,428
  • 6
  • 15