23

How can I put a bit mask on /dev/zero so that I can have a source not only for 0x00 but also for any byte between 0x01 and 0xFF?

8 Answers8

18

You cannot easily do that.

You might consider writing your own kernel module providing such a device. I don't recommend that.

You could write a tiny C program writing an infinite stream of same bytes on some pipe (or on stdout) or FIFO.

You could use tr(1) to read from /dev/zero and translate every 0 byte to somethng else.

You could use perhaps yes(1), at least if you can afford having newlines (or else pipe it into tr -d '\n'...)

  • 10
    Or use yes 1 | tr -d $'\n' for that matter. – kojiro Jun 10 '15 at 10:50
  • 3
    @kojiro: that will fail if you try to yes a stream of \n chars. An alternative which handles \n is: yes '' | tr '\n' "$c" – where $c can be any char of the full range of ASCII characters. – Peter.O Jun 10 '15 at 23:29
  • 1
    @Peter.O I'm not sure how you interpreted my comment to mean anything other than the literal, static expression yes 1 | tr -d $'\n'. I suppose you could use a shell that doesn't do the $'' backslash treatment, or you could try to find a locale that alters tr -d $'\n', but I haven't found it yet. – kojiro Jun 11 '15 at 01:55
  • @kojiro: Your yes 1 | tr -d $'\n' will quite happily print a stream of 1 characters and almost every other single-byte value, but it cannot print a stream of \n characters. The OP wants to be able to handle all byte values "between 0x01 and 0xFF" – Peter.O Jun 11 '15 at 05:29
  • 1
    loop() { if [ "$1" = $'\n' ]; then yes "$1"; else yes "$1" | tr -d $'\n' ; fi; – Petr Skocik Jun 11 '15 at 10:38
  • @PSkocik: as mentioned in my first comment, yes '' | tr '\n' "$c" will work fine, for all byte values -- Your if version, will put out two \n's for each \0 (that may, or may not be a problem) – Peter.O Jun 11 '15 at 11:20
  • @Peter.O Yup. I figured infinity times 2 was still infinity. But I'm not wedded to it. – Petr Skocik Jun 11 '15 at 11:21
18

The following bash code is set to work with the byte being representred in binary. However you can easily change it to handle ocatal, decimal or hex by simply changing the radix r value of 2 to 8, 10 or 16 respectively and setting b= accordingly.

r=2; b=01111110
printf -vo '\\%o' "$(($r#$b))"; </dev/zero tr '\0' "$o"

EDIT - It does handle the full range of byte values: hex 00-FF (when I wrote 00-7F below, I was considering only single-byte UTF-8 characters).

If, for example, you only want 4 bytes (characters in the UTF-8 'ASCII'-only hex 00-7F range), you can pipe it into head: ... | head -c4

Output (4 chars):

~~~~

To see the output in 8-bit format, pipe it into xxd (or any other 1's and 0's byte dump*):
eg. b=10000000 and piping to: ... | head -c4 | xxd -b

0000000: 10000000 10000000 10000000 10000000                    ....
Peter.O
  • 32,916
  • 1
    Did you mean to write o=$(printf ...) for the second line? – jwodder Jun 10 '15 at 12:41
  • 1
    @jwodder: No, the second line is correct as shown. The printf option -v causes tthe output to directly set the variable named immediately after it; in this case that variable's name is o (for octal) - note that the -v option applies to the shell-builtin version of printf (not to the /usr/bin/printf version) – Peter.O Jun 10 '15 at 18:10
  • 2
    @jwodder Also, in general, the -v option makes sure the variable gets set to exactly what you specified. $(...) transforms the output first. Which is why o=$(printf '\n') won't have the effect you might expect, whereas printf -vo '\n' does. (It doesn't matter here, since the output here is in a form that is unaffected by such a transformation, but if you were unaware of the -v option, then this might be useful to know.) – hvd Jun 10 '15 at 21:29
13

Well, if you literally want to achieve this, you can use a LD_PRELOAD hook. The basic idea is to rewrite a function from the C library and use it instead of the normal one.

Here is a simple example where we override the read() function to XOR the output buffer with 0x42.

#define _GNU_SOURCE
#include <string.h>
#include <errno.h>
#include <sys/types.h>
#include <dlfcn.h> 
#include <unistd.h>

static int dev_zero_fd = -1;

int open64(const char *pathname, int flags)
{
    static int (*true_open64)(const char*, int) = NULL;
    if (true_open64 == NULL) {
        if ((true_open64 = dlsym(RTLD_NEXT, "open64")) == NULL) {
            perror("dlsym");
            return -1;
        }        
    }
    int ret = true_open64(pathname, flags);
    if (strcmp(pathname, "/dev/zero") == 0) {
        dev_zero_fd = ret;
    }
    return ret;
}


ssize_t read(int fd, void *buf, size_t count)
{
    static ssize_t (*true_read)(int, void*, size_t) = NULL;
    if (true_read == NULL) {
        if ((true_read = dlsym(RTLD_NEXT, "read")) == NULL) {
            perror("dlsym");
            return -1;
        }        
    }    

    if (fd == dev_zero_fd) {
        int i;
        ssize_t ret = true_read(fd, buf, count);    
        for (i = 0; i < ret; i++) {
            *((char*)buf + i) ^= 0x42;
        }
        return ret;
    }

    return true_read(fd, buf, count);    
}

A naive implementation would XOR 0x42 on every file we read, which would have undesirable consequences. In order to solve this problem, I also hooked the open() function, making it fetch the file descriptor associated with /dev/zero. Then, we only perform the XOR in on our read() function if fd == dev_zero_fd.

Usage:

$ gcc hook.c -ldl -shared -o hook.so
$ LD_PRELOAD=$(pwd)/hook.so bash #this spawns a hooked shell
$ cat /dev/zero
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
muru
  • 72,889
yoann
  • 228
  • 3
    Given your implementation, you could have a symbolic link from from /dev/capbee to /dev/zero, search for /dev/capbee and leave /dev/zero alone. //dev/zero won't bee the same as /dev/zero. – Robert Jacobs Jun 10 '15 at 21:10
  • 1
    @RobertJacobs Indeed. We could even generate symlinks /dev/0x01, /dev/0x02, /dev/0x03, ... to /dev/zero and parse the filename to determine the bitmask to apply. – yoann Jun 10 '15 at 23:06
11

In terms of speed, the fastest I found was:

$ PERLIO=:unix perl -e '$s="\1" x 65536; for(;;){print $s}' | pv -a > /dev/null
[4.02GiB/s]

For comparison:

$ tr '\0' '\1' < /dev/zero | pv -a > /dev/null
[ 765MiB/s]
$ busybox tr '\0' '\1' < /dev/zero | pv -a > /dev/null
[ 399MiB/s]

$ yes $'\1' | tr -d '\n' | pv -a > /dev/null
[26.7MiB/s]

$ dash -c 'while : ; do echo -n "\1"; done' | pv -a > /dev/null
[ 225KiB/s]
$ bash -c 'while : ; do echo -ne "\1"; done' | pv -a > /dev/null
[ 180KiB/s]

$ < /dev/zero pv -a > /dev/null
[5.56GiB/s]
$ cat /dev/zero | pv -a > /dev/null
[2.82GiB/s]
  • In my Debian, perl yield 2.13GiB, while < /dev/zero yield 8.73GiB. What thing can affect the performance? – cuonglm Jun 11 '15 at 10:53
  • @cuonglm, yes, I see some variation between systems, but perl is consistently faster than the other solutions. I get the same throughput as with the equivalent compiled C program. The benchmark is as much on the application as on the system's scheduler here. What makes the most different is the size of the buffers being written. – Stéphane Chazelas Jun 11 '15 at 11:01
  • @cuonglm The pipe slows it down too. I think cat /dev/zero| pv -a >/dev/null will give you about 2 GiBs per second too (it does on my system, while < /dev/zero) gives me around 6GiBps. – Petr Skocik Jun 11 '15 at 11:11
  • @StéphaneChazelas May I ask what system are you on, Stéphane Chazelas? The results on mine quite differ (I can get about 2.1GiB out of the perl version). I'm on Linux ProBook 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Intel i5 Core inside. – Petr Skocik Jun 11 '15 at 11:17
  • @PSkocik, yes. I have those timing in my answer. That's why I say it's a scheduler benchmark. In the case of a pipe, control goes back and forth between the commands in the pipeline. – Stéphane Chazelas Jun 11 '15 at 11:23
  • 1
    @PSkocik, Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt9-3 (2015-04-23) x86_64 GNU/Linux, Intel(R) Core(TM)2 Duo CPU T9600 @ 2.80GHz. The newer kernel seems to make a difference (unless it's the newer perl: v5.20.2) – Stéphane Chazelas Jun 11 '15 at 11:25
  • @StéphaneChazelas: The same OS with yours, AMD phenon 3.0 GHz. – cuonglm Jun 11 '15 at 16:55
7

It's kind of pointless to try and bitmask/xor zero bytes, isn't it? Taking a byte and xoring it with zero is a no-op.

Just create a loop that gives you the bytes you want and put it behind a pipe or named pipe. It'll behave pretty much the same as a character device (won't waste CPU cycles when idle):

mkfifo pipe
while : ; do echo -n "a"; done > pipe &

And if you want to super-optimize it, you can use the C code below:

#include <stdio.h>
#include <string.h>

int main(int argc, char **argv) { 
  char c = argc == 1+1 ? argv[1][0] : 'y';

  char buff[BUFSIZ];
  memset(buff, c, BUFSIZ);

  for(;;){ 
    write(1, buff, sizeof(buff)); 
  }
}

compile & run

$ CFLAGS=-O3 make loop
./loop "$the_byte_you_want" > pipe

Performance test:

./loop 1 | pv -a >/dev/null 

2.1GB/s on my machine (even slightly faster than cat /dev/zero | pv -a >/dev/null)

Petr Skocik
  • 28,816
5

Read zeros, translate each zero to your pattern!

We read zero bytes out of /dev/zero, and use tr to apply a bit mask to each of the bytes by translating each zero byte:

$ </dev/zero tr '\000' '\176' | head -c 10
~~~~~~~~~~$

Octal 176 is the ascii code of ~, so we get 10 ~. (The $ at the end of the output indicates in my shell that there was no line end - it could look different for you)

So, let's create 0xFF bytes: Hex 0xFF is octal 0377. The leading zero is left out for the tr command line; At the end, hexdump is used to make the output readable.

$ </dev/zero tr '\000' '\377' | head -c 10 | hexdump
0000000 ffff ffff ffff ffff ffff               
000000a

You need to use the octal codes of the characters here, instead of the hexadecimal. So it's the range from \000 to octal \377 (same as 0xFF).
Use ascii -x and ascii -o to get a table of the characters with hexadecimal or octal index numbers.
(For a table with decimal and hexadecimal, just ascii).

Quite fast

It runs fairly fast, compared to just using the zeros: cat /dev/zero is only four times as fast, while it can make perfect use of IO buffering, which tr can not.

$ </dev/zero tr '\000' '\176' | pv -a >/dev/null
[ 913MB/s]

$ </dev/zero cat | pv -a >/dev/null
[4.37GB/s]

Volker Siegel
  • 17,283
3

Depends what you want to do with the data and how flexible you want to use it.

Worst case if you need speed, you could do the same as the /dev/zero, and just compile the /dev/one, /dev/two, .. /dev/fourtytwo .. and so on devices.

In most cases it should be better to create the data directly where it is needed, so inside a program/script as a constant. With more information people could better help you.

guest
  • 31
1

Infinte printf loop

Reeplace \u00 with the byte you want.

while true ; do printf "\u00" ; done | yourapp

C++ code:

#include<cstdio>

int main(){
 char out=Byte;
 while(true)
 fwrite(&out,sizeof(out),1,stdout);
}

Compile: reeplace Byte with the value you want.

g++ -O3 -o bin file.cpp -D Byte=0x01

Use

./bin | yourapp

ncomputers
  • 1,524
  • 1
  • 11
  • 23