You can take different approaches depending on whether awk treats RS as a single character (like traditional awk implementations do) or as a regular expression (like gawk or mawk do). Empty files are also tricky to be considered as awk tends to skip them.
gawk, mawk or other awk implementations where RS can be a regexp.
In those implementations (for mawk, beware that some OSes like Debian ship a very old version instead of the modern one maintained by @ThomasDickey), if RS contains a single character, the record separator is that character, or awk enters the paragraph mode when RS is empty, or treats RS as a regular expression otherwise.
The solution there is to use a regular expression that can't possibly be matched. Some come to mind like x^ or $x (x before the start, or after the end). However some (particularly with gawk) are more expensive than others. So far, I've found that ^$ is the most efficient one. It can only match on an empty input, but then there would be nothing to match against.
So we can do:
awk -v RS='^$' '{printf "%s: <%s>\n", FILENAME, $0}' file1 file2...
One caveat though is that it skips empty files (contrary to perl -0777 -n). That can be addressed with GNU awk by putting the code in a ENDFILE statement instead. But we also need to reset $0 in a BEGINFILE statement as it would otherwise not be reset after processing an empty file:
gawk -v RS='^$' '
BEGINFILE{$0 = ""}
ENDFILE{printf "%s: <%s>\n", FILENAME, $0}' file1 file2...
traditional awk implementations, POSIX awk
In those, RS is just one character, they don't have BEGINFILE/ENDFILE, they don't have the RT variable, they also generally can't process the NUL character.
You would think that using RS='\0' could work then since anyway they can't process input that contains the NUL byte, but no, that RS='\0' in traditional implementations is treated as RS=, which is the paragraph mode.
One solution can be to use a character that is unlikely to be found in the input like \1. In multibyte character locales, you can even make it byte-sequences that are very unlikely to occur as they form characters that are not assigned or non-characters like $'\U10FFFE' in UTF-8 locales. Not really foolproof though and you have a problem with empty files as well.
Another solution can be to store the whole input in a variable and to process that in the END statement at the end. That means you can process only one file at a time though:
awk '{content = content $0 RS}
END{$0 = content
printf "%s: <%s>\n", FILENAME, $0
}' file
That's the equivalent of sed's:
sed '
:1
$!{
N;b1
}
...' file1
Another issue with that approach is that if the file wasn't ending in a newline character (and wasn't empty), one is still arbitrarily added in $0 at the end (with gawk, you'd work around that by using RT instead of RS in the code above). One advantage is that you do have a record of the number of lines in the file in NR/FNR.
To work with several files at a time, one approach would be to do all the file reading by hand in a BEGIN statement (here assuming a POSIX awk, not the /bin/awk of Solaris with the API from the 70s):
awk -- '
BEGIN {
for (i = 1; i < ARGC; i++) {
FILENAME = ARGV[i]
$0 = ""
while ((getline line < FILENAME) > 0)
$0 = $0 line "\n"
# actual processing here, example:
print i". "FILENAME" has "NF" fields and "length()" characters."
}
}' *.txt
Same caveats about trailing newlines. That one has the advantage of being able to work with filenames that contain = characters.
tr '\n' 'thatchar'the file before sending it to awk, andtr 'thatchar' \n'the output? (you may need to still append a newline to ensure, like I noted above, your input file has a terminating newline:{ tr '\n' 'missingchar' < thefile ; printf "\n" ;} | awk ..... | { tr 'missingchar' '\n' }(but that add a '\n' in the end, that you may need to get rid of... maybe adding a sed before the final tr? if that tr accepts files without terminating newlines...) – Olivier Dulac Sep 14 '16 at 16:36awkdoesn't do the splitting if we don't. Having said that, not even the/bin/awkof Solaris 9 (based on the 1970's awk) had that limitation, so I'm not sure we can find one that does (still possible as SVR4's oawk had a limit of 99 and nawk 199, so it's likely the lifting of that limit was added by Sun and may not be found in other SVR4 based awks, can you test on AIX?). – Stéphane Chazelas Sep 01 '18 at 07:21ENDblock and call that function in everyFNR==1block as well as in theEND.contentwould be reset after use in the function or in theFNR==1block. – Ed Morton Jan 05 '21 at 20:30ARGVand read the lines withgetlineby hand in aBEGINstatement. – Stéphane Chazelas Jan 06 '21 at 09:18