Premise
You shouldn't incur in that error for only 15k files with that specific name format [1,2].
If you are running that expansion from another directory and you have to add the path to each file, the size of your command will be bigger, and of course it can occur.
Solution run the command from that directory.
(cd That/Directory ; cat file_{1..2000}.pdb >> file_all.pdb )
Best Solution If instead I guessed bad and you run it from the directory in which the files are...
IMHO the best solution is the Stéphane Chazelas' ones:
seq -f 'file_%.17g.pdb' 15000 | xargs cat > file_all.pdb
with printf or seq; tested on 15k files with only their number inside pre-cached it is even the faster one (at present and except the OP one from the same directory in which the files are).
Some words more
You should be able to pass to your shell command lines more long.
Your command line is 213914 characters long and contains 15003 words
cat file_{1..15000}.pdb " > file_all.pdb" | wc
...even adding 8 bytes for each word is 333 938 bytes (0.3M) far below from the 2097142 (2.1M) reported by ARG_MAX
on a kernel 3.13.0 or the slightly smaller 2088232 reported as "Maximum length of command we could actually use" by xargs --show-limits
Give it a look on your system to the output of
getconf ARG_MAX
xargs --show-limits
Laziness guided solution
In cases like this I prefer to work with blocks even because usually come out a time efficient solution.
The logic (if any) is I'm far too lazy to write 1...1000 1001..2000 etc etc...
So I ask a script to do it for me.
Only after I've checked the output is correctness I redirect it to a script.
... but Laziness is a state of mind.
Since I'm allergic to xargs
(I really should have used xargs
here) and I do not want to check how to use it, I punctually finish to reinvent the wheel as in the examples below (tl;dr).
Note that since the file names are controlled (no spaces, newlines...) you can go easily with something like the script below.
tl;dr
Version 1: pass as optional parameter the 1st file number, the last, the block size, the output file
#!/bin/bash
StartN=${1:-1} # First file number
EndN=${2:-15000} # Last file number
BlockN=${3:-100} # files in a Block
OutFile=${4:-"all.pdb"} # Output file name
CurrentStart=$StartN
for i in $(seq $StartN $BlockN $EndN)
do
CurrentEnd=$i ;
cat $(seq -f file_%.17g.pdb $CurrentStart $CurrentEnd) >> $OutFile;
CurrentStart=$(( CurrentEnd + 1 ))
done
# Here you may need to do a last iteration for the part cut from seq
[[ $EndN -ge $CurrentStart ]] &&
cat $(seq -f file_%.17g.pdb $CurrentStart $EndN) >> $OutFile;
Version 2
Calling bash for the expansion (a bit slower in my tests ~20%).
#!/bin/bash
StartN=${1:-1} # First file number
EndN=${2:-15000} # Last file number
BlockN=${3:-100} # files in a Block
OutFile=${4:-"all.pdb"} # Output file name
CurrentStart=$StartN
for i in $(seq $StartN $BlockN $EndN)
do
CurrentEnd=$i ;
echo cat file_{$CurrentStart..$CurrentEnd}.pdb | /bin/bash >> $OutFile;
CurrentStart=$(( CurrentEnd + 1 ))
done
# Here you may need to do a last iteration for the part cut from seq
[[ $EndN -ge $CurrentStart ]] &&
echo cat file_{$CurrentStart..$EndN}.pdb | /bin/bash >> $OutFile;
Of course you can go forward and get completely rid of seq
[3] (from coreutils) and work directly with the variables in bash, or use python, or compile a c program to do it [4]...
cat file_{1..15000}.pdb
construct works fine for me. – Chris Davies Feb 26 '18 at 17:36getconf ARG_MAX
should tell. – ilkkachu Feb 26 '18 at 17:43