How to fill a file with a stream from /dev/urandom with a specified number of lines ?

Question

I am trying to fill a file with a sequence of random 0 and 1s with a user-defined number of lines and number of characters per line.

the first step is to get a random stream of 0 and 1s:

cat /dev/urandom | tr -dc 01

then I tried to fill a file with this stream (and end the process of filling by ctrl+c)

cat /dev/urandom | tr -dc 01 > foo

when I count the numbers of lines of the so created foo file I get 0 lines.

cat foo | wc -l
0

Now I tried to control the stream, so I created a named pipe and directed the stream into it. Then I made a connection to the named pipe with the dd command in vain hope to control this way the amount of characters per line and number of lines in the file.

makefifo namedpipe
cat /dev/urandom | tr -dc 01 > namedpipe
dd if=namedpipe of=foo bs=10 count=5

the foo file got indeed filled with 50 byte of 0 and 1 , but the number of lines was still 0.

How can I solve it, I guess maybe I have to insert each number of characters a newline into the file, but if so , I do not know how.

score 9 · Accepted Answer · answered Mar 23 '15 at 00:20

How about fold? It's part of coreutils...

$ tr -dc 01 < /dev/urandom | fold -w 30 | head -n 5
001010000111110001100101101101
000101110011011100100101111000
111010101011100101010110111001
111011000000000101111110110100
110011010111001110011010100011

Or if that's not available, some flavour of awk:

$ tr -dc 01 < /dev/urandom | awk \$0=RT RS=.\{,30} | head -n 5
000100010010001110100110100111
101010010100100110111010001110
100011100101001010111101001111
010010100111100101101100010100
001101100000101001111011011000

Or you could just do something with a loop...

$ for line in $(seq 1 5)
> do
>     echo $(tr -dc 01 < /dev/urandom | head -c 30)
> done
100101100111011110010010100000
000000010000010010110111101011
010000111110010010000000010100
001110110001111011101011001001
001010111011000111110001100110

I'm sure there are other ways... I thought maybe hexdump with a custom format could do it, but apparently not... ;)

Small note for macOS users per this that you need to prefix the tr call with LC_ALL=C in order for it to work. LC_ALL=C tr -dc '01' < /dev/urandom | fold -w 30 | head -n 5 worked very well here. Thank you! — Greg Sadetsky, Oct 05 '20 at 17:37

mikeserv · Answer 2 · 2015-03-23T23:12:33.593

LC_ALL=C </dev/urandom \
tr '\0-\377' '[0*128][1*]' |
dd ibs=50 cbs=10 conv=unblock count=1

That will convert all input ascii bytes (which will be all bytes because LC_ALL=C is specified) into one of either 0 or 1 on an even distribution. The first 128 bytes between \0 and \177 are converted to zeroes and the \200-\377 to ones - and so you get to use all input bytes and still output randomly ordered sequences of only 1 or 0.

You were right to go with dd, but you don't need to set your bs= block-size to get 5 output lines of 11 bytes (10 + \newline) a piece. Instead you should specify a count=1 single read() for an input block of ibs=50 bytes which can then be divided into 5 cbs=10 sized conversion blocks, and conv=unblocked at cbs-size by appending a \newline to each conversion block after stripping all trailing spaces (of which you have none).

So I just ran it and it printed:

1101001010
1100001001
1101110100
1011011000
1011110100
1+0 records in
0+1 records out
55 bytes (55 B) copied, 0.00176591 s, 31.1 kB/s

I also upped the ante a little bit to show a speed comparison between one method and another and to demonstrate that dd's reading from a pipe is not an issue if you read-in at a block-factor which accounts for the writing utility's buffer-size. So I first did:

time (
LC_ALL=C </dev/urandom \
tr -dc 01 |
dd ibs=4k cbs=10 conv=unblock count=k|
grep \[^01])

...which rendered no output on stdout (so grep matched nothing other than 0 or 1) and the following on stderr:

1024+0 records in
9011+1 records out
4613735 bytes (4.6 MB) copied, 25.8898 s, 178 kB/s
( LC_ALL=C tr -dc 01 < /dev/urandom |\
  dd ibs=4k cbs=10 conv=unblock count=k |...)\
0.80s user 25.42s system 101% cpu 25.921 total

The above information tells us that the pipeline spent 25.5 secs waiting on system calls. Ok. But also it tells us that dd read-in all 1024 of its 4096-byte sized input records completely and not a single one was truncated due to an early read() return - and this is because tr buffers piped output on 4k blocks.

Anyway, doing it the other way - or converting all random input on a spread spectrum, was next:

time (
LC_ALL=C </dev/urandom \
tr '\0-\377' '[0*128][1*]' |
dd ibs=4k cbs=10 conv=unblock count=k|
grep '[^01]')

Once again, there was nothing on stdout - so all of dd's output was either one of a zero or a one or a newline - and this on stderr:

1024+0 records in
9011+1 records out
4613735 bytes (4.6 MB) copied, 0.554202 s, 8.3 MB/s
( LC_ALL=C tr '\0-\377' '[0*128][1*]' \
  < /dev/urandom|dd ibs=4k cbs=10 ...)\
0.61s user 0.36s system 171% cpu 0.571 total

...which once again demonstrates that dd read-in all 1024 complete input records + 0 truncated input records, but the processing time is significantly different. tr and dd are actually able to work in parallel here and together use more user time total on separate cores than it takes the whole process to complete in just under .6 seconds. That's a little faster.

@frostschutz - you can use it (when combined with different variations on conv=sync,block,swab and possibly others) for blocking out pipelines in real time. You might find this interesting if you liked the dd thing here. — mikeserv, Mar 23 '15 at 22:33

score 1 · Answer 3 · answered Mar 23 '15 at 00:14

1

To add a newline during the generation process do:

{ process-without-terminating-newline ; echo ;} > outfile

To add it to an existing file do:

echo >> outfile

answered Mar 23 '15 at 00:14

Janis

14,222

score 1 · Answer 4 · edited Apr 13 '17 at 12:36

then I tried to fill a file with this stream (and end the process of filling by ctrl+c)
cat /dev/urandom | tr -dc 01 > foo
when I count the numbers of lines of the so created foo file I get 0 lines.
cat foo | wc -l
0

Both cat and tr buffer their output. When you press Ctrl+C, any data that's still in either command's buffer is lost. You interrupted the programs early enough that tr hadn't accumulated a full buffer's worth yet, so it had written nothing out.

Don't use dd to read from character devices or pipes.

On Linux, you can use head to truncate data after a certain number of bytes.

i=0
while [ "$i" -lt "$number_of_lines" ]; do
  </dev/urandom tr -dc 01 | head -c "$bits_per_line"; echo
  i=$((i+1))
done >foo

Alternatively, produce the desired number of bytes, and use fold to inject newlines.

</dev/urandom tr -dc 01 |
fold -w "$bits_per_line" |
head -n "$number_of_lines"

Rejecting every byte that isn't 0 or 1 is rather slow: you're rejecting 127/128th of the inputs. There's no standard utility that produces output in base 2, but you can use od to produce hexadecimal and convert digit by digit.

</dev/urandom od -An -tx1 |
sed 's/ //g; s/0/@@@@/g; s/1/@@@`/g; s/2/@@`@/g; s/3/@@``/g; s/4/@`@@/g; s/5/@`@`/g; s/6/@``@/g; s/7/@```/g; s/8/`@@@/g; s/9/`@@`/g; s/[Aa]/`@`@/g; s/[Bb]/`@``/g; s/[Cc]/``@@/g; s/[Dd]/``@`/g; s/[Ee]/```@/g; s/[Ff]/````/g; y/@`/01/' |
fold -w "$bits_per_line" |
head -n "$number_of_lines"

If you have xxd, you can use it to convert bytes into their base 2 representation. If the number of bits per line is a multiple of 8, you can even use its -c option to make it insert newlines when desired and -l to make it stop after a number of lines.

</dev/urandom xxd -b -c "$bytes_per_line" -l "$((bytes_per_line * number_of_lines))" |
sed -e 's/  .*//' -e 's/.*://'

dc and bc are two standard tools which can both produce base two. And anyway, tr can just convert across the ascii spectrum to convert every random input byte to either 1 or 0. And the dd thing? tr's buffer will result in writes of more than 50 bytes per (and the conv=sync arg could handle the rest if it were even an issue). tr and dd are probably the most efficient tools for this after all. — mikeserv, Mar 23 '15 at 06:17

How to fill a file with a stream from /dev/urandom with a specified number of lines ?

4 Answers4