Just encrypt zeroes(***).
Encryption does exactly what you want. Encrypted zeroes look like random data. Decrypting turns it back into zeroes. It's deterministic, repeatable, reversible so as long as you know the key and keep using the same cipher settings(**).
Example overwriting a drive with random data using cryptsetup
:
cryptsetup open --type plain --cipher aes-xts-plain64 /dev/deletedisk cryptodeletedisk
# overwrite with zeroes
pv < /dev/zero > /dev/mapper/cryptodeletedisk
# verify (also drop caches or power cycle)
echo 3 > /proc/sys/vm/drop_caches
pv < /dev/mapper/cryptodeletedisk | cmp - /dev/zero
### alternatively, run badblocks in destructive mode:
badblocks -w -b 4096 -t 0 -v -s /dev/mapper/cryptodeletedisk
This should utilize full disk speed on a modern system with AES-NI.
If no errors are found, you know that the drive was fully overwritten with random data, and returned the correct data when reading it back.
Example using cryptsetup for repeatedly piping the same random data (without involving real storage).
In this example, instead of using /dev/zero
as the data source, we exploit that sparse files can be created arbitrarily large and are also fully zero:
# truncate -s 1E exabyte_of_zero
# cryptsetup open --type plain --cipher aes-xts-plain64 --readonly exabyte_of_zero exabyte_of_random
Enter passphrase for exabyte_of_zero: yourseed
# hexdump -C -n 64 /dev/mapper/exabyte_of_random
# cat /dev/mapper/exabyte_of_random | something_that_wanted_random_data
This only works if your filesystem supports sparse files without side effects (does not work on tmpfs).
Note: this is just an example; actually creating a virtual exabyte device like this might have side effects. Limit the size to something reasonable for your use case, and put the file in a location where backup scripts don't try to pick it up and compress it, etc.
cryptsetup
provides a readable block device, and it's seekable, so you can also use it to start comparing data somewhere in the middle of a file. You can also use it to analyze the random data at any offset and make sure it's really random and does not repeat(*). A traditional PRNG would usually require you to re-generate everything from start.
cryptsetup
requires root permissions (although the resulting device mapper device could be made readable for any other user.)
Without root permissions, you can use openssl
to generate random data by encrypting zeroes. (Already mentioned in comments.)
$ openssl enc -pbkdf2 -aes-256-ctr -nosalt \
-pass pass:yourseed < /dev/zero \
| hexdump -C -n 64
00000000 62 5e 3d cd 39 dc d6 a2 bb 73 2d 0f 63 b1 f1 75 |b^=.9....s-.c..u|
00000010 4d 84 f5 75 cb b6 1e 33 9c e8 41 9c 76 4b 7e 12 |M..u...3..A.vK~.|
00000020 c2 90 d5 93 2d a9 9e a0 48 bd b8 3e a5 1a d6 f7 |....-...H..>....|
00000030 2c a6 e0 07 4d 5a 45 31 13 dc ef 97 df 76 c5 b8 |,...MZE1.....v..|
00000040
This approach is also suggested by coreutils documentation on Sources of random data as a way to "generate a reproducible arbitrary amount of pseudo-random data given a seed value".
Instead of encrypting zeroes, you can also use a traditional PRNG for the job. It's just that I'm not aware of any standard tools that provide it. So this approach involves picking any PRNG algorithm / library of your choice and writing a few lines of code to produce the data.
shred
has a nice PRNG but you can't seed it yourself and also can't pipe it, so there is no way to utilize it here.
tee
can multiplex data from a single random source to multiple processes, however it requires this data to be consumed immediately and in parallel (example: Shuffle two parallel text files), so this is only suitable if the data has to be identical, but does not have to be reproduced again at a later time.
(*) When encrypting, it's important to choose the correct cipher settings. For example, aes-xts-plain
repeats data after 2TB, this was fixed in aes-xts-plain64
. So don't use aes-xts-plain
anymore.
(**) Different cipher settings will lead to different results. The commands shown in this answer rely on some default settings, so data may not be identical in the long term. For example cryptsetup might be using either 256 or 512 bit keys and openssl uses a default iteration count.
(***) The reason we encrypt zeroes at all is that the kernel offers performant data sources for arbitrary amounts of zero. Otherwise, encrypting any other pattern would work just as well.
srand()
andrand()
.) but if there is a standard tool to get this done, knowing about it would be nice. – UTF-8 Mar 18 '17 at 23:24badblocks
which is also included infsck
. Your concept will not result in a better verification. If no bad blocks are found, then your drive looks good. If you find some then it's starting to get less reliable. Of course there are other factors that can cause a drive to fail but these won't be noticeable in either tests until it fails. – Julie Pelletier Mar 19 '17 at 02:37