How to parallelize dd?

Question

I'm currently having trouble with dd invoked with a sparse file as input (if) and a file as output (of) with conv=sparse. dd seems to be using one core of the CPU (Intel(R) Core(TM) i7-3632QM CPU @ 2.20GHz 4 cores + 4 Intel Hyperthreads) only (100 % of 1 core), so I've been wondering whether it's possible to parallelize dd. I've been

looking into info dd and man dd and there seems to built-in function in the version of corutils 8.23
checking sgp_dd from sg3-utils package (without understanding whether it suits my needs), but it doesn't seem to be able to handle sparse files
dcfldd doesn't seems to have parallelization capabilities

AFAIK

an enhanced version/fork with internal handling of program parts in multiple threads (avoid context changes killing I/O performance) is preferred over
a solution with GNU parallel running locally is preferred over
a custom (possibly untested) code sniplet

How to avoid CPU being the bottleneck of an I/O intensive operation? I'd like to run the command on Ubuntu 14.04 with Linux 3.13 and handle sparse file disk images with it on any filesystem supporting sparse file (at least the solution shouldn't be bound to one specific file system).

Background: I'm trying to create a copy of 11TB sparse file (containing about 2TB data) on a zfs (zfsonlinux 0.6.4 unstable version, possibly buggy and the cause for the CPU bottleneck (eventually slow hole search)). That shouldn't change anything for the question of how to parallelize dd (in a very generic way).

I don't see what you could gain from this, as this operation is I/O bound except in extreme cases. In my opinion the best option would be a program that's sparse aware, e.g something like xfs_copy. Its man page mentions: "However, if the file is created on an XFS filesystem, the file consumes roughly the amount of space actually used in the source filesystem by the filesystem and the XFS log. The space saving is because xfs_copy seeks over free blocks instead of copying them and the XFS filesystem supports sparse files efficiently.". — Cristian Ciupitu, Oct 10 '14 at 20:40
@CristianCiupitu Well in my case the CPU is the bottleneck - don't ask me why, because I don't know. Your answer made me realize that the solution should support multiple filesystems (able to handle sparse files) (edited) — Kalle Richter, Oct 10 '14 at 20:44
What CPU and filesystem do you have? How big is the file (length & blocks)? — Cristian Ciupitu, Oct 10 '14 at 20:47
dd hogs the CPU by default due to small blocksize. make it larger, like bs=1M. — frostschutz, Oct 10 '14 at 23:49
@CristianCiupitu CPU-bound filesystem copy isn't as rare as you suggest. Besides faster SSDs, the file might actually be in the system's disk cache. I'm looking at a cp task taking 100% CPU on my server right now, I think because I have a 20GiB file that Ubuntu's disk caching seems to have loaded into RAM. — sudo, Sep 08 '17 at 02:05

Ole Tange · Answer 1 · 2018-02-08T10:44:14.473

7

Tested in Bash:

INFILE=in
seq 0 1000 $((`stat --format %s $INFILE` /100000 )) |
  parallel -k dd if=$INFILE bs=100000 skip={} conv=sparse seek={} count=1000 of=out

You probably need to adjust 1000.

edited Feb 08 '18 at 10:44

answered Oct 11 '14 at 08:32

Ole Tange

35,514

score 3 · Answer 2 · answered Oct 10 '14 at 21:58

3

One custom, untested code sniplet coming up:

dd if=oldf conv=sparse bs=1k                 count=3000000000                 of=newf &
dd if=oldf conv=sparse bs=1k skip=3000000000 count=3000000000 seek=3000000000 of=newf &
dd if=oldf conv=sparse bs=1k skip=6000000000 count=3000000000 seek=6000000000 of=newf &
dd if=oldf conv=sparse bs=1k skip=9000000000 count=3000000000 seek=9000000000 of=newf &
wait

This should logically partition the file into four 3TB chunks and process them in parallel. (skip= skips over input blocks; seek= seeks over output blocks.) The fourth command will, of course, read up to the end of the old file, so the count= parameter isn't strictly necessary.

answered Oct 10 '14 at 21:58

G-Man Says 'Reinstate Monica'

22,870

I thought about some like that as well, but couldn't figure out how make it a generic solution for files of arbitrary size (the background of the question shouldn't have influenced my request for a generic solution) – Kalle Richter Oct 10 '14 at 22:03
I don't understand what you're saying. I just took the stated size of your file and divided by the number of cores. That can trivially be done by a script. – G-Man Says 'Reinstate Monica' Oct 10 '14 at 22:33
3

probably also needs conv=notrunc – frostschutz Oct 10 '14 at 23:47
@frostschutz: Maybe only the first one. Although I can't find this documented, my testing indicates that conv=notrunc is implied by seek= with a positive value. – G-Man Says 'Reinstate Monica' Oct 31 '14 at 17:05
1

no, seek does not imply notrunc. – frostschutz Nov 01 '14 at 10:51
@G-Man, what testing? dd if=/dev/zero of=/tmp/test count=3; ls -l /tmp/test; dd if=/dev/zero of=/tmp/test seek=1 count=1; ls -l /tmp/test clearly shows otherwise. I'm using dd (coreutils) 8.25. – Cristian Ciupitu Sep 10 '17 at 20:11

How to parallelize dd?

2 Answers2

Linked

Related