dd: multiple input files

Question

I need to concatenate chunks from two files:

if I needed concatenate whole files, I could simply do

cat file1 file2 > output

But I need to skip first 1MB from the first file, and I only want 10 MB from the second file. Sounds like a job for dd.

dd if=file1 bs=1M count=99 skip=1 of=temp1
dd if=file2 bs=1M count=10 of=temp2
cat temp1 temp2 > final_output

Is there a possibility to do this in one step? ie, without the need to save the intermediate results? Can I use multiple input files in dd ?

score 24 · Accepted Answer · answered May 02 '16 at 11:36

24

dd can write to stdout too.

( dd if=file1 bs=1M count=99 skip=1
  dd if=file2 bs=1M count=10  ) > final_output

answered May 02 '16 at 11:36

meuh

51,383

This is probably the best way. The output file isn't closed/reopened (like it is with oflag=append conv=notrunc), so filesystems that do delayed allocation (like XFS) are least likely to decide the file is done being written when there's still more to go. – Peter Cordes May 02 '16 at 15:46
@PeterCordes that's a good point, but as long as dd isn't asked to sync, delayed allocation shouldn't kick in immediately anyway (unless memory is tight in which case neither method will postpone allocation). – Stephen Kitt May 02 '16 at 15:52
@StephenKitt: You're probably right. I was thinking of XFS's speculative preallocation, where it does need to specially detect the close/reopen access pattern (sometimes seen for log files). – Peter Cordes May 02 '16 at 15:59
3

In shells like bash and mksh that don't optimize out the fork for the last command in a subshell, you can make it slightly more efficient by replacing the subshell with a command group. For other shells, it shouldn't matter, and the subshell approach might even be slightly more efficient as the shell doesn't need to save and restore stdout. – Stéphane Chazelas May 02 '16 at 16:13

score 11 · Answer 2 · answered May 02 '16 at 08:18

11

I don't think you can easily read multiple files in a single dd invocation, but you can append to build the output file in several steps:

dd if=file1 bs=1M count=99 skip=1 of=final_output
dd if=file2 bs=1M count=10 of=final_output oflag=append conv=notrunc

You need to specify both conv=notrunc and oflag=append. The first avoids truncating the output, the second starts writing from the end of the existing file.

answered May 02 '16 at 08:18

Stephen Kitt

434,908

Promising but doesn’t work on macOS Catalina dd: unknown operand oflag. Any workarounds to avoid using cat? – sunknudsen May 15 '21 at 14:47
You could use meuh’s approach, or split it up: dd if=file2 bs=1M count=10 >> final_output in the second step of my answer. – Stephen Kitt May 15 '21 at 15:56

Stéphane Chazelas · Answer 3 · 2016-05-02T16:04:44.430

Bear in mind that dd is a raw interface to the read(), write() and lseek() system call. You can only use it reliably to extract chunks of data off regular files, block devices and some character devices (like /dev/urandom), that is files for which read(buf, size) is guaranteed to return size as long as the end of the file is not reached.

For pipes, sockets and most character devices (like ttys), you have no such guarantee unless you do read()s of size 1, or use the GNU dd extension iflag=fullblock.

So either:

{
  gdd < file1 bs=1M iflag=fullblock count=99 skip=1
  gdd < file2 bs=1M iflag=fullblock count=10
} > final_output

Or:

M=1048576
{
  dd < file1 bs=1 count="$((99*M))" skip="$M"
  dd < file2 bs=1 count="$((10*M))"
} > final_output

Or with shells with builtin support for a seek operator like ksh93:

M=1048576
{
  command /opt/ast/bin/head -c "$((99*M))" < file1 <#((M))
  command /opt/ast/bin/head -c "$((10*M))" < file2
}

Or zsh (assuming your head supports the -c option here):

zmodload zsh/system &&
{
  sysseek 1048576 && head -c 99M &&
  head -c 10M < file2
} < file1 > final_output

Do you really need the quotes? Wont the result always be an integer? — Zombo, May 02 '16 at 23:07
@StevenPenny, leaving the expansion unquoted is asking the shell to split+glob it which wouldn't make any sense here. The split part being done on the current value of $IFS. That's irrespective of the content of the variable/expansion. See also Security implications of forgetting to quote a variable in bash/POSIX shells — Stéphane Chazelas, May 03 '16 at 07:29
@Stéphane Chazelas - in the first example, you are using gdd instead of dd. Is that a typo, or is that intentional ? — Martin Vegter, May 03 '16 at 09:00

score 3 · Answer 4 · edited May 02 '16 at 15:47

3

With a bashism, and a functionally "useless use of cat", but closest to the syntax the OP uses:

cat <(dd if=file1 bs=1M count=99 skip=1) \
    <(dd if=file2 bs=1M count=10) \
   > final_output

(That being said, Stephen Kitt's answer seems to be the most efficient possible method.)

edited May 02 '16 at 15:47

Stephen Kitt

434,908

answered May 02 '16 at 15:39

agc

7,223

3

Strictly speaking, <(...) is a kshism which both zsh and bash copied. – Stéphane Chazelas May 02 '16 at 15:44

dd: multiple input files

4 Answers4