17

I need to concatenate chunks from two files:

if I needed concatenate whole files, I could simply do

cat file1 file2 > output

But I need to skip first 1MB from the first file, and I only want 10 MB from the second file. Sounds like a job for dd.

dd if=file1 bs=1M count=99 skip=1 of=temp1
dd if=file2 bs=1M count=10 of=temp2
cat temp1 temp2 > final_output

Is there a possibility to do this in one step? ie, without the need to save the intermediate results? Can I use multiple input files in dd ?

Martin Vegter
  • 358
  • 75
  • 236
  • 411

4 Answers4

24

dd can write to stdout too.

( dd if=file1 bs=1M count=99 skip=1
  dd if=file2 bs=1M count=10  ) > final_output
meuh
  • 51,383
  • This is probably the best way. The output file isn't closed/reopened (like it is with oflag=append conv=notrunc), so filesystems that do delayed allocation (like XFS) are least likely to decide the file is done being written when there's still more to go. – Peter Cordes May 02 '16 at 15:46
  • @PeterCordes that's a good point, but as long as dd isn't asked to sync, delayed allocation shouldn't kick in immediately anyway (unless memory is tight in which case neither method will postpone allocation). – Stephen Kitt May 02 '16 at 15:52
  • @StephenKitt: You're probably right. I was thinking of XFS's speculative preallocation, where it does need to specially detect the close/reopen access pattern (sometimes seen for log files). – Peter Cordes May 02 '16 at 15:59
  • 3
    In shells like bash and mksh that don't optimize out the fork for the last command in a subshell, you can make it slightly more efficient by replacing the subshell with a command group. For other shells, it shouldn't matter, and the subshell approach might even be slightly more efficient as the shell doesn't need to save and restore stdout. – Stéphane Chazelas May 02 '16 at 16:13
11

I don't think you can easily read multiple files in a single dd invocation, but you can append to build the output file in several steps:

dd if=file1 bs=1M count=99 skip=1 of=final_output
dd if=file2 bs=1M count=10 of=final_output oflag=append conv=notrunc

You need to specify both conv=notrunc and oflag=append. The first avoids truncating the output, the second starts writing from the end of the existing file.

Stephen Kitt
  • 434,908
  • Promising but doesn’t work on macOS Catalina dd: unknown operand oflag. Any workarounds to avoid using cat? – sunknudsen May 15 '21 at 14:47
  • You could use meuh’s approach, or split it up: dd if=file2 bs=1M count=10 >> final_output in the second step of my answer. – Stephen Kitt May 15 '21 at 15:56
8

Bear in mind that dd is a raw interface to the read(), write() and lseek() system call. You can only use it reliably to extract chunks of data off regular files, block devices and some character devices (like /dev/urandom), that is files for which read(buf, size) is guaranteed to return size as long as the end of the file is not reached.

For pipes, sockets and most character devices (like ttys), you have no such guarantee unless you do read()s of size 1, or use the GNU dd extension iflag=fullblock.

So either:

{
  gdd < file1 bs=1M iflag=fullblock count=99 skip=1
  gdd < file2 bs=1M iflag=fullblock count=10
} > final_output

Or:

M=1048576
{
  dd < file1 bs=1 count="$((99*M))" skip="$M"
  dd < file2 bs=1 count="$((10*M))"
} > final_output

Or with shells with builtin support for a seek operator like ksh93:

M=1048576
{
  command /opt/ast/bin/head -c "$((99*M))" < file1 <#((M))
  command /opt/ast/bin/head -c "$((10*M))" < file2
}

Or zsh (assuming your head supports the -c option here):

zmodload zsh/system &&
{
  sysseek 1048576 && head -c 99M &&
  head -c 10M < file2
} < file1 > final_output
3

With a bashism, and a functionally "useless use of cat", but closest to the syntax the OP uses:

cat <(dd if=file1 bs=1M count=99 skip=1) \
    <(dd if=file2 bs=1M count=10) \
   > final_output

(That being said, Stephen Kitt's answer seems to be the most efficient possible method.)

Stephen Kitt
  • 434,908
agc
  • 7,223