4

I am trying to create large dummy files on a drive using dd. I am currently doing this:

#!/bin/bash
writeFile(){ #$1 - destination directory/filename, $2 - source filepath $3 - blocksize, $4 - blockcount $5 - log file name

if [ "$#" -ne 5 ]; then
    echo "Bad number of args - Should be 4, not $#"
    return 1;
fi

dest_filepath=$1
src_filepath=$2
block_size=$3
block_count=$4
log_file=$5

int_regex='^[0-9]+$' 

file_size=$(($block_size * $block_count))
src_file_size=`ls -l $src_filepath | awk '{print $5}'`
full_iter=0
while [[ $file_size -ge $src_file_size ]]; do
    file_size=$((file_size - $src_file_size))
    full_iter=$((full_iter + 1))
done

section_block_count=$(($src_file_size / $block_size))
echo $section_block_count $block_size
topping_off_block_count=$(($file_size / $block_size))

dest_dir=$(dirname $dest_filepath)
if [ -d "$dest_dir" ] && [ -r $src_filepath ] && [[ $block_size =~ $int_regex ]] && [[ $block_count =~ $int_regex ]]; then
    data_written=0
    for (( i=0 ; i < $full_iter ; i=$((i+1)) )); do
        (time dd of=$dest_filepath if=$src_filepath bs=$block_size count=$section_block_count seek=$data_written) >> $log_file 2>&1 #Output going to external file
        data_written=$(($data_written + $src_file_size +1 ))
        echo $data_written
    done

    if [[ $file_size -gt 0 ]]; then
        (time dd of=$dest_filepath if=$src_filepath bs=$block_size count=$topping_off_block_count seek=$data_written) >> $log_file 2>&1 & #Output going to external file
    fi
    return 0;
fi

return 1;   
}

However, this isn't working, as it's either only writing from the src_filepath once, or writing over the same part of the file multiple times, I don't know how to find out the difference. In this particular case, what I'm doing is writing from a 256MB file 4 times to create a single 1GB file, but I want to keep it generic so that I can write any size from and to.

The aim is to fragment a hard drive, and measure the output of dd (rate of transfer specifically) and the time it took.

I am on an embedded system with limited functionality, and the OS is a very but down version of linux using busybox.

How do I alter this so that it will write the correct size file?

Yann
  • 1,190
  • 4
    Why don't you just cat the file? Something like for i in a b c d; do cat $file1 >> $file2; done? You seem to have chosen an extremely complex way to get this done, what is your actual objective? – terdon Oct 28 '14 at 15:34
  • 1
    Try adding conv=notrunc to the dd lines. – Mark Plotnick Oct 28 '14 at 16:00
  • @MarkPlotnick I gave it a go, but apparently my busybox system dd doesn't have support for conv -.- – Yann Oct 28 '14 at 16:04
  • Ah, in that case, please [edit] your question and include exactly what you're trying to do. Please also specify your OS and shell language (I know you tagged as bash but it should also be mentioned in the question since you have no shebang line). – terdon Oct 28 '14 at 16:23
  • 1
    busybox != bash – ErlVolton Oct 28 '14 at 16:24
  • I agree with terdon – you seem to have gone out of your way to make this much more complicated than it needs to be. But (1) “it's either writing from the src_filepath only once, or writing over the same part of the file multiple times, I don't know how to find out the difference.” It’s writing over the same part of the file multiple times. You can debug things like this by inserting a set -x command before any statements whose execution you want to monitor. See How to debug a bash script? – G-Man Says 'Reinstate Monica' Oct 28 '14 at 16:32
  • A couple of possible fixes: (2) Use the seek= option to dd, if your version supports it. (3) Rather than dd of=$dest_filepath … >> $log_file 2>&1, do dd … >> $dest_filepath 2>> $log_file. (4) A bit of general advice: always quote your shell variable references (e.g., "$dest_filepath" and "$log_file") unless you have a good reason not to, and you’re sure you know what you’re doing. – G-Man Says 'Reinstate Monica' Oct 28 '14 at 16:33
  • @G-Man I'm writing from the same file, so to look at it, there would be no difference between it writing once and over the same bit 4 times. Also, I am using seek. I'm not sure I follow your third point, and it's getting a little long for a comment, why not explain it in an answer? – Yann Oct 28 '14 at 16:37
  • Without conv=notrunc, dd will truncate the output file every time it's run. Can you get a traditional dd executable for your system? – Mark Plotnick Oct 28 '14 at 16:58
  • i think what you want to do is dd <<IN >file\n$(cat file file file file file)\nIN. If you have a tee at your disposal which will handle - args, then tee <file - - - - would work well there too. – mikeserv Dec 04 '14 at 11:54

1 Answers1

1

replying to comments: conv=notrunc makes dd not truncate, but doesn't make it seek to the end. (It leaves out O_TRUNC, but doesn't add O_APPEND in the open(2) system call).

Answering the question: If you insist on using dd instead of cat, then get the shell to open the output file for append, and have dd write to its stdout.

dd if=src bs=128k count=$count of=/dev/stdout >> dest 2>> log

Also, if you're trying to fragment your drive, you could do a bunch of fallocate(1) allocations to use space, and then start using dd once the drive is near full. util-linux's fallocate program is a simple front-end to the fallocate(2) system call.

xfs for example will detect the open, append pattern and leave its speculatively-preallocated space beyond EOF allocated for a few seconds after closing. So on XFS, a loop of appending to the same file repeatedly won't produce as much fragmentation as writing many small files.

You're on an embedded system, so I assume you're not using xfs. In that case, you still might see less fragmentation from your close/reopen/write-more that you'd expect, with a decently smart filesystem. Maybe sync between each write, to wait for the FS to allocate and write out all your data, before letting it know there's more coming.

Peter Cordes
  • 6,466