8

I have a large folder containing many sub-directories each holding many .txt files. I want to concatenate all of these files into one .txt file. I am able to do it for each of the sub-directories with cat *.txt>merged.txt, but I am trying to do it for all of the files in the large folder. How do I do this?

αғsнιη
  • 41,407

3 Answers3

13

try with

find /path/to/source -type f -name '*.txt' -exec cat {} + >mergedfile

find all '*.txt' files in /path/to/source recursively for sub-directories and concatenate all into one mergedfile.

To concatenate each sub-directories files within its directory, do:

find . -mindepth 1 -type d -execdir sh -c 'cat $1/*.txt >> $1/mergedfile' _ {} \;
αғsнιη
  • 41,407
  • >> can be > in the first find call. – Kusalananda Jun 04 '18 at 05:36
  • @Kusalananda won't that truncate the mergedfile if ARG_MAX exceed? – αғsнιη Jun 04 '18 at 05:49
  • The > redirects the output of find, not cat. The cat command ends at the +, and you can't do redirections in -exec without using a child shell (sh -c). In your second example, you won't need it either as you do one directory at a time. – Kusalananda Jun 04 '18 at 05:52
  • Actually, that second example won't work. Since -execdir is already executing with the directory as the working directory, you should get rid of $1/ in the command. – Kusalananda Jun 04 '18 at 05:56
  • @Kusalananda your first point about using > instead of >> in first command is right but $1/ is needed in second command and that works I tested before. note that execdir is changing for the find not for the child-shell I used there – αғsнιη Jun 04 '18 at 06:10
  • Ah, you're absolutely correct. I didn't notice you were searching for directories! – Kusalananda Jun 04 '18 at 06:30
  • What is the purpose of the underscore (_) that appears just before the the opening and closing braces ({})? – Derek Mahar Mar 28 '23 at 01:04
  • @DerekMahar That is used as a placeholder for the used inline-shell command's "zeroth" argument or $0, which is usually represent the name of the shell or script being executed. so if any failure reported, it will be reported with that name. you can use any other name instead of _ there. – αғsнιη Mar 28 '23 at 02:17
  • @αғsнιη, why does the command not work if I remove the underscore parameter? More specifically, why can't the first argument to sh be {} which is the placeholder that find replaces with each directory name? – Derek Mahar Mar 29 '23 at 12:48
  • 1
    @DerekMahar _ there is 0th argument to the sh -c '....' and {} is the 1st. when you remove the _, the {} is being 0th argument while the sh -c '...' perform and do stuff on the $1 argument but now there is no 1st ($1) argument. why we don't use the {} as the first argument because in general always the 1st argument is the script name and all errors/warning/... will use that name prefixed to alert where things go wrong. – αғsнιη Mar 29 '23 at 14:30
  • @αғsнιη, thank you for the explanation. I think I understand the purpose of the 0th argument. When you invoke a named shell script, $0 is always the name of the script, but in this case, the script is anonymous, so you must provide the name of the script as the first (0th) argument. Is this correct? – Derek Mahar Mar 29 '23 at 22:14
  • Example: sh -c 'echo $0 $1 $2' a b c prints a b c and sh -c 'echo $1 $2' a b c prints b c. In both cases, sh assigns a to $0, b to $1, and c to $2, but only the first example prints all of the arguments. – Derek Mahar Mar 29 '23 at 22:54
  • @DerekMahar yes. all passed arguments to the script in both cases are available. it depends if you want to use/print them or not. in second example 0th argument is also available to the script but you didn't print it doesn't mean it's not available. – αғsнιη Mar 30 '23 at 02:04
  • Ran into the issue that the file itself was added over and over again because I used the current directory for search and output. Maybe it also helps others: find ./ -type f -name '*.txt' -not -name 'mergedfile.txt' -exec cat {} + >mergedfile.txt. – BadAtLaTeX Feb 29 '24 at 18:24
2

If you are using Bash and the number of text files is contained (i.e. does not exceed the maximum argument number limit, which is very large but not infinite), you can easily achieve this with the globstar feature:

$ shopt -s globstar
$ cat **/*.txt > merged.txt

A more general, although less elegant approach, will be to use find as the driver and make it call cat on each file, appending the output:

$ find -name \*.txt -exec sh -c 'cat {} >> merged.out' \;

Calling sh is needed here because you want to append the result of each cat. Make sure the output file has a different extension or lies outside of the tree you're merging, or find may try to concatenate the output with itself.

undercat
  • 1,857
1

If you have to do the concatenation in a particular order, then the below will concatenate the files in lexicographical order (sorted by pathnames) in bash:

shopt -s globstar
for name in **/*.txt; do
    [ -f "$name" ] && cat <"$name"
done >merged.out

This is similar to the find command

find  . -type f -name '*.txt' -exec cat {} ';' >merged.out

except that the ordering may be different, symbolic links to regular files would be included (add a && [ ! -L "$name" ] if you don't want them) and hidden files (and files in hidden directories) would be excluded (use shopt -s dotglob to add them back).

Kusalananda
  • 333,661
  • what else your first command does that the same in my answer doesn't? – αғsнιη Jun 04 '18 at 06:14
  • @αғsнιη Absolutely nothing now when you've changed your answer. I will modify that part. Thanks for letting me know. – Kusalananda Jun 04 '18 at 06:26
  • Does bash guarantee that **/*.txt sorts the pathnames in lexicographical order? – Derek Mahar Mar 29 '23 at 22:27
  • 1
    @DerekMahar Yes, the list resulting from expanding a globbing pattern is guaranteed to be lexicographically sorted. From the POSIX standard: "If the pattern matches any existing filenames or pathnames, the pattern shall be replaced with those filenames and pathnames, sorted according to the collating sequence in effect in the current locale." – Kusalananda Mar 29 '23 at 22:36