Under the hood
There is no more efficient way than copying the first file, then copying the second file after it, and so on. Both DOS copy
and cat
do that.
Each file is stored independently of other files on the disk. Almost every filesystem designed to store data on a disk-like device operates by blocks. Here's a highly simplified presentation of what happens: the disk is divided into blocks of, say 1kB, and for each file the operating system stores the list of blocks that make it up. Most files aren't an integer number of blocks long, so the last block is only partially occupied. In practice, filesystems have many optimizations, such as sharing the last partial block between several files or storing “blocks 46798 to 47913” rather than “block 46798, block 46799, …”. When the operating system needs to create a new file, it looks for free blocks. The blocks don't have to be consecutive: if only blocks 4, 5, 98 and 178 are free, you can still store a 4kB file. Using blocks rather than going down to the byte level helps make finding free blocks for a new or growing file considerably faster, and reduces the problems due to fragmentation when you create or grow and delete or shrink a lot of files (leaving an increasing number of holes).
You could support partial blocks in mid-file, but that would add considerable complexity, particularly when accessing files non-sequentially: to jump to the 10340th byte, you could no longer jump to the 100th byte of the 11th block, you'd have to check the length of every intervening block.
Given the use of blocks, you can't just join two files, because in general the first file ends in mid-block. Sure, you could have a special case, but only if you want to delete both files when concatenating. That would be a highly specific handling for a rare operation. Such special handling doesn't live on its own, because on a typical filesystem, many file are being accessed at the same time. So if you want to add an optimization, you need to think carefully: what happens if some other process is reading one of the files involved? What happens if someone tries to concatenate A and B while someone is concatenating A and C? And so on. All in all, this rare optimization would be a huge burden.
All in all, you can't make joining files more efficient without making major sacrifices elsewhere. It's not worth it.
On splitting and joining
split
and cat
are simple ways of splitting and joining files. split
takes care of producing files named in alphabetical order, so that cat *
works for joining.
A downside of cat
for joining is that it is not robust against common failure modes. If one of the files is truncated or missing, cat
will not complain, you'll just get damaged output.
There are compression utilities that produce multipart archives, such as zipsplit
and rar -v
. They aren't very unixy, because they compress and pack (assemble multiple files into one) in addition to splitting (and conversely unpack and uncompress in addition to joining). But they are useful in that they verify that you have all the parts, and that the parts are complete.
cat x*
, because the order of files depends on your locale settings. Better start typingcat x
, than press Esc and then*
- you'll see the expanded order of files and can rearrange. – rozcietrzewiacz Nov 15 '11 at 12:33cat x*
you could consider shell brace expansion,cat xa{a..g}
which expands the specified sequence tocat
xaa xab xac xad xae xaf xag – Peter.O Nov 15 '11 at 12:57copy
(on windows) seemed like a more efficient method thancat
, party beacuse help forcopy
mentions that it can be used this way. I knew thatcat
would work to join files, and it works quickly with small files, but I was trying to ask if there was a better way to join files - especially very large files. – cwd Nov 15 '11 at 14:21cat x*
? Would the new locale setting not also affectsplit
so that ifsplit
andcat x*
were used on the same system they would always work? – cwd Nov 15 '11 at 14:29copy /b … outputfile
does exactly whatcat … >outputfile
does. The/b
flag tellscopy
not to mess up the data, and the syntax ofcopy
is weird, but under the hood they do the same job. – Gilles 'SO- stop being evil' Nov 15 '11 at 23:31cat
is in fact the best way. – cwd Nov 15 '11 at 23:44split
command constructs its output file names in a manner that isn't susceptible to locale-specific reordering. (Though I suppose you could create a customized locale in which the 26 lowercase Latin letters aren't in their usual order.) – Keith Thompson Nov 16 '11 at 02:00split.c
in GNU Coreutils, the suffixes are constructed from a fixed array of characters:static char const *suffix_alphabet = "abcdefghijklmnopqrstuvwxyz";
. The suffix wouldn't be affected by the locale. (But I don't think any sane locale would reorder the lowercase letters; even EBCDIC maintains their standard order.) – Keith Thompson Nov 16 '11 at 02:04split
, I agree with Keith. I was referring to a general habit of concatenating files. And, more broadly, feeding a list of files to a command. – rozcietrzewiacz Nov 16 '11 at 08:06cat x{{a..j}{a..z},k{a..f}} > myImage.iso
. That will expand fromxaa
toxkf
. – Madacol Mar 01 '20 at 22:46