Merging files from most recent

Question

I wish to get the command for merging files from the most recent to the oldest in bash from a particular directory. Meaning files with newer dates are saved before ones with older dates

You mean you want to order the files by date, and cat them in this order? Have a look at man ls for "sort by date" options. This will only work if the file names don't contain funny characters. — dirkt, Oct 07 '21 at 11:46
@schrodigerscatcuriosity No, I just know that for merging from oldest to recent for file inls -tr .*; do cat $file >> Save.txt; done So I was wandering if there is any for saving starting from recent dates — Ngouaba Rosalie, Oct 07 '21 at 12:07
Regarding your many questions with no accepted answer, please see https://unix.stackexchange.com/help/someone-answers — Kusalananda, Oct 07 '21 at 12:56

Kusalananda · Answer 1 · 2021-10-07T16:22:25.280

2

In the zsh shell, the globbing pattern and globing qualifier *(.om) would expand to all names of regular files in the current directory, ordered by their modification timestamp. The most recently modified file is first in the resulting list. If the directory does not have any regular files, the pattern generates an error in the shell.

In the zsh shell, therefore,

cat ./*(.om) >Save.txt

or, for many thousands of files, with a loop,

for name ( ./*(.om) ) cat $name >Save.txt

Calling this from bash:

zsh -c 'for name ( ./*(.om) ) cat $name >Save.txt'

You may also use zargs in zsh, which is a sort of built-in variant of xargs:

autoload -U zargs
zargs -- ./*(.om) -- cat -- >Save.txt

From bash,

zsh -c 'autoload -U zargs; zargs -- ./*(.om) -- cat -- >Save.txt'

edited Oct 07 '21 at 16:22

answered Oct 07 '21 at 12:41

Kusalananda

333,661

zsh -c 'for name ( ./*(.om) ) cat $name' >Save.txt means ./Save.txt will be included in the expansion of ./*(.om) and if there also are files with modification times in the future, you could end up in some infinite loop that fills up the disks. – Stéphane Chazelas Oct 07 '21 at 16:09
@StéphaneChazelas Ah, because of the redirection happening before the invocation of zsh -c. Hopefully alleviated now by including the redirection in the zsh code instead. – Kusalananda Oct 07 '21 at 16:23
Or use autoload -U zargs; set -o extendedglob; { zargs -- ./^Save.txt(.om) -- cat -- ; } >Save.txt to be on the safe side. Note that the GNU implementation of cat will refuse to read the file its stdout is opened on (if it's a regular file). – Stéphane Chazelas Oct 07 '21 at 16:27

schrodingerscatcuriosity · Answer 2 · 2021-10-07T12:33:51.417

You can simply do this, suppose we have these files:

$ cat a.txt 
a
$ cat b.txt 
b
$ cat c.txt 
c
$ ls -lt *.txt
-rw-rw-r-- 1 user user 2 oct  7 09:21 a.txt
-rw-rw-r-- 1 user user 2 oct  7 09:21 b.txt
-rw-rw-r-- 1 user user 2 oct  7 09:21 c.txt

Then we run this command:

$ ls -1t *.txt | xargs -I {} cat "{}" > Save.txt
$ cat Save.txt 
a
b
c

ls -1t list just the names of the files.
xargs -I {} cat "{}" performs a cat for each file passed as argument.

And an important note: Why not parse ls (and what to do instead)?.

cas · Answer 3 · 2021-10-11T17:38:21.130

There are many ways of doing this, but one of the best if you want to stick to using just shell syntax and common utilities is to use the GNU versions of find (for the -printf option), sort and sed (for the -z option), and xargs (for -0):

find . -maxdepth 1 -type f -printf '%T@\t%p\0' |
  sort -z -r -n -k 1,1 |
  sed -z -e 's/^[^\t]*\t//' |
  xargs -0r cat > merged.txt

This will work with filenames containing ANY valid characters, including spaces, tabs, newlines and those used by the shell like ;, <, >, |, and & - the only character which isn't valid in a filename is the NUL character, which is why it's being used as the filename separator (and why it's the only reliable filename separator to use).

The find command outputs all of the filenames in the current directory prefixed with their modification time (in seconds since the epoch) %T@ and a tab %t, then the filename itself and a NUL character - this is effectively an enhanced -print0 with a timestamp as well as the filename. The -maxdepth 1 option limits it to the current directory only - i.e. tells it not to recurse into sub-directories.

This is then piped into sort to reverse sort the filenames by the timestamps, and then into sed to remove the timestamp from before the filenames and then finally into xargs which cats all the filenames it gets from STDIN. Output is redirected to merged.txt.

BTW, if you're using FreeBSD or a Mac, FreeBSD's find also supports -printf and its version of sort supports -z and their xargs has -0. Unfortunately, their version of sed doesn't support -z, so you'd have to use something else - perl would be a good substitute as its -p and -n options make it work very much like sed. e.g. use the following in place of sed in the pipeline above:

perl -0 -p -e 's/^[^\t]*\t//'

or just install GNU sed.

BTW, there's no particular reason not to use perl on linux too - it's just that sed is smaller & simpler and has slightly less startup overhead than perl....a trivial difference on modern systems.

Alternatively, you could just do the whole thing in perl:

$ perl -e '@ARGV = sort { (stat($b))[9] <=> (stat($a))[9] } @ARGV;
    while (<>) {
      if ($ARGV eq "merged.txt") { close(ARGV); next } ; # skip to next file
      print
    }' -- * > merged.txt

In this, perl sorts its filename arguments by timestamp (using its built-in stat function, which returns an array with the modification timestamp as its 10th element, so we use [9] because perl arrays start from 0 rather than 1. See perldoc -f stat), then prints them out....excluding "merged.txt" which is the redirection target. In essence, this is a re-implementation of cat in perl.

A fancier version would take a -o outputfile option or similar and open its own output file (and remove the output filename from @ARGV before sorting it - in case it already exists and is matched by the * glob), then it wouldn't need to hard-code an exclusion for the output file.

#!/usr/bin/perl
use Getopt::Std;
getopts('o:', %opts);
$opts{o} = '/dev/stdout' unless defined($opts{o}); # default to stdout
alternatively, you could print an error message to STDERR and exit:
die "-o option is required\n" unless defined($opts{o});
@ARGV = grep { ! /^$opts{o}$/ } @ARGV;
@ARGV = sort { (stat($b))[9] <=> (stat($a))[9] } @ARGV;
open($out,">",$opts{o});
while (<>) {
  print $out $_;
};
close($out);

You'd save this as somewhere in your $PATH (you don't want it in the current directory, otherwise it will be included in the output - there are ways to avoid this, but they'd make the script longer and a little more complicated than is wanted in a simple example), e.g., make it executable with chmod, and run this as:

merge.pl -o merged.txt -- *

Note: the grep, stat, and sort above are built-in perl functions, not the command line utilities. You can get details about them with perldoc -f.

AFAIK, FreeBSD / macos find still doesn't support GNU's -printf (it supports -print0 though). sort on BSDs is GNU sort, or at least used to be and still is on some. In any case, that explains why they support the GNU sort API. — Stéphane Chazelas, Oct 11 '21 at 07:07
So Apple not only have an ideological bias against GNU, they couldn't even be bothered to keep their BSD-based utils up to date. great. — cas, Oct 11 '21 at 17:32

Merging files from most recent

3 Answers3

alternatively, you could print an error message to STDERR and exit:

die "-o option is required\n" unless defined($opts{o});

Linked