13

I have a bunch of files with the same extension (let's say .txt) and I would like to concatenate them. I am using cat *.txt > concat.txt, but I would like to add a new line between each file so to distinguish them in concat.txt.

Is it possible to do it with a single bash command rather than an implementation such as this?

Thank you

Gigiux
  • 517
  • 1
  • 7
  • 17

9 Answers9

14

Not a single command, but a simple one-liner:

for f in *.txt; do cat -- "$f"; printf "\n"; done > newfile.txt

That will give this error:

cat: newfile.txt: input file is output file

But you can ignore it, at least on GNU/Linux systems. Stéphane Chazelas pointed out in the comments that apparently, on other systems this could result in an infinite loop instead, so to avoid it, try:

for f in *.txt; do 
    [[ "$f" = newfile.txt ]] || { cat -- "$f"; printf "\n"; }
done > newfile.txt

Or just don't add a .txt extension to the output file (it isn't needed and doesn't make any difference at all, anyway) so that it won't be included in the loop:

for f in *.txt; do cat -- "$f"; printf "\n"; done > newfile
terdon
  • 242,166
  • 2
    Not all cat implementations will give you that input file is output file. Some others will happily run here potentially causing an infinite loop that fills up the filesystem. – Stéphane Chazelas Jan 11 '21 at 15:36
  • 1
    Note that [[ "$f" = "newfile.txt" ]] is a kshism. POSIXly, you'd use [ "$f" = newfile.txt ]. – Stéphane Chazelas Jan 11 '21 at 15:38
  • @StéphaneChazelas wait, what? That's a cat issue? I always thought it was the shell, not cat. Then why doesn't cat file1 file2 > file1 complain? As for the quotes, thanks fixed. Having unquoted strings feels weird to me. – terdon Jan 11 '21 at 15:45
  • 1
    For cat file > file, I suppose your cat detects file is empty and does nothing instead of reporting an error. Solaris cat still reports an error there. Note how the error message starts with cat:. I can't see how the shell could detect the condition. – Stéphane Chazelas Jan 11 '21 at 15:50
  • @StéphaneChazelas looks like you're right, unsurprisingly enough. This will reproduce the error: ( echo foo> newfile.txt; cat newfile.txt; ) > newfile.txt while this does not ( cat newfile.txt ) > newfile.txt. So my cat (GNU coreutils, 8.32) seems to detect that the file is empty and doesn't complain in the second one. TIL, thanks! – terdon Jan 11 '21 at 16:01
12

Using GNU sed:

sed -s -e $'$a\\\n' ./*.txt >concat.out

This concatenates all data to concat.out while at the same time appending an empty line to the end of each file processed.

The -s option to GNU sed makes the $ address match the last line in each file instead of, as usual, the last line of all data. The a command appends one or several lines at the given location, and the data added is a newline. The newline is encoded as $'\n', i.e. as a "C-string", which means we're using a shell that understands these (like bash or zsh). This would otherwise have to be added as a literal newline:

sed -s -e '$a\
' ./*.txt >concat.out

Actually, '$a\\' and '$a\ ' seems to work too, but I'm not entirely sure why.

This also work, if one thinks the a command is too bothersome to get right:

sed -s -e '${p;g;}' ./*.txt >concat.out

Any of these variation would insert an empty line at the end of the output of the last file too. If this final newline is not wanted, deletede it by passing the overall result through sed '$d' before redirecting to your output file:

sed -s -e '${p;g;}' ./*.txt | sed -e '$d' >concat.out
Kusalananda
  • 333,661
  • 1
    @StéphaneChazelas You know, GNU software tries to be so convenient that it's sometimes difficult to understand the magic that they implement... – Kusalananda Jan 11 '21 at 15:59
  • @StéphaneChazelas. sed -s -e $'a\\\n' adds an extra newline to every line of every file - not just the last line of each file. It is not equivalent to sed -s -e '${p;g;}' – fpmurphy Jan 12 '21 at 06:15
  • @Kusalananda. sed -s -e $'$a\n' ./*.txt >concat.out results in an extra newline at the end of concat.out. The OP wanted a newline between each file only. – fpmurphy Jan 12 '21 at 06:18
  • @fpmurphy, sorry, I meant $'$a\\\n', the point being that $'$a\n' is $a<newline>, not $a<backslash><newline> like in the variant not using $'...'. – Stéphane Chazelas Jan 12 '21 at 06:26
  • @fpmurphy I'm aware that they get an extra newline at the end, and I'm ignoring it as it's trivial to remove it. Hmmm... I might mention how to do that anyway... Stephane was referring to a previous edit of my text that did not have the p;g; variation. – Kusalananda Jan 12 '21 at 06:33
5

Using GNU awk:

gawk -v RS='^$' -v ORS= '{
    print sep $0; sep="\n";
}' ./file*.txt >single.file

see Slurp-mode in awk?

prefix dot-slash in files name ./ is used to avoid problems with files named like file=x.txt for instance as awk do reading these kind of strings as a variable when these come after awk codes;

Another GNU awk approach would be:

gawk 'BEGINFILE{if (ARGIND>1) print ""};1' ./file*.txt >single.txt

which is better as it would add an empty line even if the last line doesn't end in a newline character and would avoid loading the whole files in memory.


there is also a sed alternative, but to remove very last \newline, you should add another pipe sed ... | to remove that.

sed -s '$s/$/\n/' file*.txt >single.file
αғsнιη
  • 41,407
5

zsh has a P glob qualifier to prefix each filename resulting from a glob with an arbitrary argument.

While it's typically used for things like cmd *.txt(P[-i]) to prefix each filename with a given option, you could use here to insert any given file before each file. A temporary file containing an empty line could be done with =(print), so you could do:

() { cat file*.txt(P[$1]); } =(print)

On Linux or Cygwin, you could also do:

cat file*.txt(P[/dev/stdin]) <<< ''

To add an empty line between non-empty files only:

awk 'NR > 1 && FNR == 1 {print ""}; {print}' ./file*.txt
5

Perhaps not exactly what you were looking for, but like Quasímodo suggested in a comment, GNU's tail can add the empty line, in addition to a header with the filename:

$ echo 'this is foo' > foo.txt 
$ echo 'this is bar' > bar.txt   
$ tail -n+1 foo.txt bar.txt 
==> foo.txt <==
this is foo

==> bar.txt <== this is bar

The -n+1 causes it to print the whole file; it means "print the tail starting from line 1."

If you want the header to be added even when there is only one file for consistency, you can use -v.

$ tail -n+1 foo.txt        
this is foo
$ tail -v -n+1 foo.txt 
==> foo.txt <==
this is foo
JoL
  • 4,735
1

This does not work in POSIX /bin/sh, but in bash:

cat file1 <(echo) file2 >concatenated

The <(echo) is replaced by a temporary named pipe that is connected to the output of the echo command, which generates a single newline.

  • 2
    ... but it will only work easily for two files, and the OP seems to have "a bunch" of them. Maybe you can expand the answer to show how to make this into a shell script accepting "an arbitrary" number of input files? – AdminBee Jan 13 '21 at 11:07
1

An example using Perl.

$ perl -e 'while(<>){print}continue{print"\n" if eof}' *.txt > concat.txt

which can be simplified to

$ perl -ne 'print; print "\n" if eof' [abc].txt > concat.txt
0

awk 1 *.txt

Per https://www.gnu.org/software/gawk/manual/html_node/Very-Simple.html

when the awk action is omitted, as it is here, the default action is to print (with a newline) all lines that match the pattern; 1 matches every input line (regardless of whether it was newline terminated or EOF).

Ben L
  • 1
0

Using Raku (formerly known as Perl_6)

Raku can take files off the command line into the $*ARGFILES dynamic variable. The key here is to remember to convert to individual $fh filehandles using the .handles routine. Then simply pad the output as desired:

~$ raku -e 'for $*ARGFILES.handles -> $fh { put $fh, "\n", $fh.lines.join("\n"), "\n" };' file?
fileA
>TCONS_00000867
>TCONS_00001442
>TCONS_00001447
>TCONS_00001528
>TCONS_00001529
>TCONS_00001668
>TCONS_00001921
>TCONS_00001922

fileB >TCONS_00001528 >TCONS_00001529 >TCONS_00001668 >TCONS_00001921 >TCONS_00001922 >TCONS_00001924

fileC >TCONS_00001529 >TCONS_00001668 >TCONS_00001921 >TCONS_00001922 >TCONS_00001924 >TCONS_00001956 >TCONS_00002048

fileD >TCONS_00001922 >TCONS_00001924 >TCONS_00001956 >TCONS_00002048

Sample Input: from HERE

Sample Output: as above (each file padded with $fh filename before and a \n newline after).


NOTE: Raku has a similar next-handle method with an :on-switch named-argument (see second link below for example usage), but above seems simplest.

https://docs.raku.org/type/IO/ArgFiles#$*ARGFILES
https://docs.raku.org/type/IO/CatHandle#class_IO::CatHandle
https://raku.org

jubilatious1
  • 3,195
  • 8
  • 17