This is how you do you can do this:
i=$(((t=19876543212)-(h=12345678901)))
{ dd count=0 skip=1 bs="$h"
dd count="$((i/(b=64*1024)-1))" bs="$b"
dd count=1 bs="$((i%b))"
} <infile >outfile
That's all that is really necessary - it doesn't require much more. In the first place dd count=0 skip=1 bs=$block_size1
will lseek()
over regular file input practically instantaneously. There is no chance of missed data or whatever other untruths are told about it, you can just seek directly to your desired start position. Because the file descriptor is owned by the shell and the dd
's are merely inheriting it, they will affect its cursor position and so you can just take it in steps. It really is very simple - and there is no standard tool better suited to the task than dd
.
That uses a 64k blocksize which is often ideal. Contrary to popular belief, larger blocksizes do not make dd
work faster. On the other hand, tiny buffers are no good either. dd
needs to synchronize its time in system calls so that it need not wait on copying data into memory and out again, but also so that it need not wait on system calls. So you want it to take enough time that the next read()
doesn't have to wait on the last, but not so much that you're buffering in larger sizes than is necessary.
So the first dd
skips to the start position. That takes zero time. You could call any other program you liked at that point to read its stdin and it would begin reading directly at your desired byte offset. I call another dd
to read ((interval / blocksize) -1)
count blocks to stdout.
The last thing that is necessary is to copy out the modulus (if any) of the previous division operation. And that's that.
Don't believe it, by the way, when people state facts on their face without evidence. Yes, it is possible for dd
to do a short read (though such things are not possible when reading from a healthy block device - thus the name). Such things are only possible if you do not correctly buffer a dd
stream which is read from other than a block device. For example:
cat data | dd bs="$num" ### incorrect
cat data | dd ibs="$PIPE_MAX" obs="$buf_size" ### correct
In both cases dd
copies all of the data. In the first case it is possible (though unlikely with cat
) that some of the output blocks which dd
copies out will bit equal "$num" bytes because dd
is spec'd only to buffer anything at all when the buffer is specifically requested on its command-line. bs=
represents a maximum block-size because the purpose of dd
is real-time i/o.
In the second example I explicitly specify the output blocksize and dd
buffers reads until complete writes can be made. That doesn't affect count=
which is based on input blocks, but for that you just need another dd
. Any misinformation which is given you otherwise should be disregarded.
bs=1M iflag=skip_bytes,count_bytes
– frostschutz Mar 27 '14 at 17:34