How can I put a bit mask on /dev/zero so that I can get bytes other than zero?

Question

How can I put a bit mask on /dev/zero so that I can have a source not only for 0x00 but also for any byte between 0x01 and 0xFF?

You can use this answer as reference: http://stackoverflow.com/questions/12634503/how-to-use-xor-in-bash — Romeo Ninov, Jun 10 '15 at 08:56
I gave an answer to this question, but reading it again I think I missunderstood it. Do you want to translate each 0x00 to a specific value or to a random value in the 0x00-0xFF range? — kos, Jun 11 '15 at 00:48
@kos each to a specific value like 444444... not a random one — Eduard Florinescu, Jun 11 '15 at 01:07

Basile Starynkevitch · Answer 1 · 2015-06-11T06:17:18.497

18

You cannot easily do that.

You might consider writing your own kernel module providing such a device. I don't recommend that.

You could write a tiny C program writing an infinite stream of same bytes on some pipe (or on stdout) or FIFO.

You could use tr(1) to read from /dev/zero and translate every 0 byte to somethng else.

You could use perhaps yes(1), at least if you can afford having newlines (or else pipe it into tr -d '\n'...)

edited Jun 11 '15 at 06:17

answered Jun 10 '15 at 08:58

Basile Starynkevitch

10,561

10

Or use yes 1 | tr -d $'\n' for that matter. – kojiro Jun 10 '15 at 10:50
3

@kojiro: that will fail if you try to yes a stream of \n chars. An alternative which handles \n is: yes '' | tr '\n' "$c" – where $c can be any char of the full range of ASCII characters. – Peter.O Jun 10 '15 at 23:29
1

@Peter.O I'm not sure how you interpreted my comment to mean anything other than the literal, static expression yes 1 | tr -d $'\n'. I suppose you could use a shell that doesn't do the $'' backslash treatment, or you could try to find a locale that alters tr -d $'\n', but I haven't found it yet. – kojiro Jun 11 '15 at 01:55
@kojiro: Your yes 1 | tr -d $'\n' will quite happily print a stream of 1 characters and almost every other single-byte value, but it cannot print a stream of \n characters. The OP wants to be able to handle all byte values "between 0x01 and 0xFF" – Peter.O Jun 11 '15 at 05:29
1

loop() { if [ "$1" = $'\n' ]; then yes "$1"; else yes "$1" | tr -d $'\n' ; fi; – Petr Skocik Jun 11 '15 at 10:38
@PSkocik: as mentioned in my first comment, yes '' | tr '\n' "$c" will work fine, for all byte values -- Your if version, will put out two \n's for each \0 (that may, or may not be a problem) – Peter.O Jun 11 '15 at 11:20
@Peter.O Yup. I figured infinity times 2 was still infinity. But I'm not wedded to it. – Petr Skocik Jun 11 '15 at 11:21

score 18 · Accepted Answer · edited Jun 11 '15 at 13:31

18

The following bash code is set to work with the byte being representred in binary. However you can easily change it to handle ocatal, decimal or hex by simply changing the radix r value of 2 to 8, 10 or 16 respectively and setting b= accordingly.

r=2; b=01111110
printf -vo '\\%o' "$(($r#$b))"; </dev/zero tr '\0' "$o"

EDIT - It does handle the full range of byte values: hex 00-FF (when I wrote 00-7F below, I was considering only single-byte UTF-8 characters).

If, for example, you only want 4 bytes ~~(characters in the UTF-8 'ASCII'-only hex 00-7F range)~~, you can pipe it into head: ... | head -c4

Output (4 chars):

~~~~

To see the output in 8-bit format, pipe it into xxd (or any other 1's and 0's byte dump*):
eg. b=10000000 and piping to: ... | head -c4 | xxd -b

0000000: 10000000 10000000 10000000 10000000                    ....

edited Jun 11 '15 at 13:31

Stéphane Chazelas

544,893

answered Jun 10 '15 at 09:43

Peter.O

32,916

1

Did you mean to write o=$(printf ...) for the second line? – jwodder Jun 10 '15 at 12:41
1

@jwodder: No, the second line is correct as shown. The printf option -v causes tthe output to directly set the variable named immediately after it; in this case that variable's name is o (for octal) - note that the -v option applies to the shell-builtin version of printf (not to the /usr/bin/printf version) – Peter.O Jun 10 '15 at 18:10
2

@jwodder Also, in general, the -v option makes sure the variable gets set to exactly what you specified. $(...) transforms the output first. Which is why o=$(printf '\n') won't have the effect you might expect, whereas printf -vo '\n' does. (It doesn't matter here, since the output here is in a form that is unaffected by such a transformation, but if you were unaware of the -v option, then this might be useful to know.) – hvd Jun 10 '15 at 21:29

score 13 · Answer 3 · edited Sep 27 '15 at 06:05

Well, if you literally want to achieve this, you can use a LD_PRELOAD hook. The basic idea is to rewrite a function from the C library and use it instead of the normal one.

Here is a simple example where we override the read() function to XOR the output buffer with 0x42.

#define _GNU_SOURCE
#include <string.h>
#include <errno.h>
#include <sys/types.h>
#include <dlfcn.h> 
#include <unistd.h>

static int dev_zero_fd = -1;

int open64(const char *pathname, int flags)
{
    static int (*true_open64)(const char*, int) = NULL;
    if (true_open64 == NULL) {
        if ((true_open64 = dlsym(RTLD_NEXT, "open64")) == NULL) {
            perror("dlsym");
            return -1;
        }        
    }
    int ret = true_open64(pathname, flags);
    if (strcmp(pathname, "/dev/zero") == 0) {
        dev_zero_fd = ret;
    }
    return ret;
}


ssize_t read(int fd, void *buf, size_t count)
{
    static ssize_t (*true_read)(int, void*, size_t) = NULL;
    if (true_read == NULL) {
        if ((true_read = dlsym(RTLD_NEXT, "read")) == NULL) {
            perror("dlsym");
            return -1;
        }        
    }    

    if (fd == dev_zero_fd) {
        int i;
        ssize_t ret = true_read(fd, buf, count);    
        for (i = 0; i < ret; i++) {
            *((char*)buf + i) ^= 0x42;
        }
        return ret;
    }

    return true_read(fd, buf, count);    
}

A naive implementation would XOR 0x42 on every file we read, which would have undesirable consequences. In order to solve this problem, I also hooked the open() function, making it fetch the file descriptor associated with /dev/zero. Then, we only perform the XOR in on our read() function if fd == dev_zero_fd.

Usage:

$ gcc hook.c -ldl -shared -o hook.so
$ LD_PRELOAD=$(pwd)/hook.so bash #this spawns a hooked shell
$ cat /dev/zero
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

Given your implementation, you could have a symbolic link from from /dev/capbee to /dev/zero, search for /dev/capbee and leave /dev/zero alone. //dev/zero won't bee the same as /dev/zero. — Robert Jacobs, Jun 10 '15 at 21:10
@RobertJacobs Indeed. We could even generate symlinks /dev/0x01, /dev/0x02, /dev/0x03, ... to /dev/zero and parse the filename to determine the bitmask to apply. — yoann, Jun 10 '15 at 23:06

score 11 · Answer 4 · edited Apr 13 '17 at 12:36

11

In terms of speed, the fastest I found was:

$ PERLIO=:unix perl -e '$s="\1" x 65536; for(;;){print $s}' | pv -a > /dev/null
[4.02GiB/s]

For comparison:

$ tr '\0' '\1' < /dev/zero | pv -a > /dev/null
[ 765MiB/s]
$ busybox tr '\0' '\1' < /dev/zero | pv -a > /dev/null
[ 399MiB/s]

$ yes $'\1' | tr -d '\n' | pv -a > /dev/null
[26.7MiB/s]

$ dash -c 'while : ; do echo -n "\1"; done' | pv -a > /dev/null
[ 225KiB/s]

$ bash -c 'while : ; do echo -ne "\1"; done' | pv -a > /dev/null
[ 180KiB/s]

$ < /dev/zero pv -a > /dev/null
[5.56GiB/s]
$ cat /dev/zero | pv -a > /dev/null
[2.82GiB/s]

edited Apr 13 '17 at 12:36

Community

1

answered Jun 11 '15 at 10:38

Stéphane Chazelas

544,893

In my Debian, perl yield 2.13GiB, while < /dev/zero yield 8.73GiB. What thing can affect the performance? – cuonglm Jun 11 '15 at 10:53
@cuonglm, yes, I see some variation between systems, but perl is consistently faster than the other solutions. I get the same throughput as with the equivalent compiled C program. The benchmark is as much on the application as on the system's scheduler here. What makes the most different is the size of the buffers being written. – Stéphane Chazelas Jun 11 '15 at 11:01
@cuonglm The pipe slows it down too. I think cat /dev/zero| pv -a >/dev/null will give you about 2 GiBs per second too (it does on my system, while < /dev/zero) gives me around 6GiBps. – Petr Skocik Jun 11 '15 at 11:11
@StéphaneChazelas May I ask what system are you on, Stéphane Chazelas? The results on mine quite differ (I can get about 2.1GiB out of the perl version). I'm on Linux ProBook 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Intel i5 Core inside. – Petr Skocik Jun 11 '15 at 11:17
@PSkocik, yes. I have those timing in my answer. That's why I say it's a scheduler benchmark. In the case of a pipe, control goes back and forth between the commands in the pipeline. – Stéphane Chazelas Jun 11 '15 at 11:23
1

@PSkocik, Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt9-3 (2015-04-23) x86_64 GNU/Linux, Intel(R) Core(TM)2 Duo CPU T9600 @ 2.80GHz. The newer kernel seems to make a difference (unless it's the newer perl: v5.20.2) – Stéphane Chazelas Jun 11 '15 at 11:25
@StéphaneChazelas: The same OS with yours, AMD phenon 3.0 GHz. – cuonglm Jun 11 '15 at 16:55

Petr Skocik · Answer 5 · 2015-06-11T13:10:31.637

7

It's kind of pointless to try and bitmask/xor zero bytes, isn't it? Taking a byte and xoring it with zero is a no-op.

Just create a loop that gives you the bytes you want and put it behind a pipe or named pipe. It'll behave pretty much the same as a character device (won't waste CPU cycles when idle):

mkfifo pipe
while : ; do echo -n "a"; done > pipe &

And if you want to super-optimize it, you can use the C code below:

#include <stdio.h>
#include <string.h>

int main(int argc, char **argv) { 
  char c = argc == 1+1 ? argv[1][0] : 'y';

  char buff[BUFSIZ];
  memset(buff, c, BUFSIZ);

  for(;;){ 
    write(1, buff, sizeof(buff)); 
  }
}

compile & run

$ CFLAGS=-O3 make loop
./loop "$the_byte_you_want" > pipe

Performance test:

./loop 1 | pv -a >/dev/null

2.1GB/s on my machine (even slightly faster than cat /dev/zero | pv -a >/dev/null)

edited Jun 11 '15 at 13:10

answered Jun 11 '15 at 08:22

Petr Skocik

28,816

I originally tried using putchar in C, but it was slow. – Petr Skocik Jun 11 '15 at 11:07
Out of curiosity, why argc == 1+1 instead of agrc == 2? – Reinstate Monica -- notmaynard Jun 11 '15 at 17:04
@iamnotmaynard To remind myself that it's 1 for the command line executable plus 1 argument. :-D – Petr Skocik Jun 11 '15 at 20:49
Ah. That was my guess, but wanted to make sure there wasn't some secret reason. – Reinstate Monica -- notmaynard Jun 11 '15 at 21:04
"Taking a byte and xoring it with zero is a no-op." This isn't true: 0 XOR X == X. – jacwah Jul 17 '15 at 21:22
@jacwah Effectively, XORring with 0 hasn't affected your X in any way whatsoever. That's the definition of a no-op. – Petr Skocik Jul 17 '15 at 21:42
I see what you mean now. I interpreted it as 0 XOR anything will always be 0. It's not pointless in this case if there for example was a xor command. Then one could pipe /dev/zero to XOR the bytes into anything. You might want to clarify your answer. – jacwah Jul 17 '15 at 22:06
The point is, if you want an infinite stream of the byte 'a', just write 'a' in a loop. There's no point in XORring those 'a's with '\0's to get 'a's again. – Petr Skocik Jul 17 '15 at 22:49

score 5 · Answer 6 · edited Jun 11 '20 at 12:04

Read zeros, translate each zero to your pattern!

We read zero bytes out of /dev/zero, and use tr to apply a bit mask to each of the bytes by translating each zero byte:

$ </dev/zero tr '\000' '\176' | head -c 10
~~~~~~~~~~$

Octal 176 is the ascii code of ~, so we get 10 ~. (The $ at the end of the output indicates in my shell that there was no line end - it could look different for you)

So, let's create 0xFF bytes: Hex 0xFF is octal 0377. The leading zero is left out for the tr command line; At the end, hexdump is used to make the output readable.

$ </dev/zero tr '\000' '\377' | head -c 10 | hexdump
0000000 ffff ffff ffff ffff ffff               
000000a

You need to use the octal codes of the characters here, instead of the hexadecimal. So it's the range from \000 to octal \377 (same as 0xFF).
Use ascii -x and ascii -o to get a table of the characters with hexadecimal or octal index numbers.
(For a table with decimal and hexadecimal, just ascii).

Quite fast

It runs fairly fast, compared to just using the zeros: cat /dev/zero is only four times as fast, while it can make perfect use of IO buffering, which tr can not.

$ </dev/zero tr '\000' '\176' | pv -a >/dev/null
[ 913MB/s]
$ </dev/zero cat | pv -a >/dev/null

[4.37GB/s]

score 3 · Answer 7 · answered Jun 10 '15 at 20:46

Depends what you want to do with the data and how flexible you want to use it.

Worst case if you need speed, you could do the same as the /dev/zero, and just compile the /dev/one, /dev/two, .. /dev/fourtytwo .. and so on devices.

In most cases it should be better to create the data directly where it is needed, so inside a program/script as a constant. With more information people could better help you.

ncomputers · Answer 8 · 2016-02-06T07:50:48.023

1

Infinte printf loop

Reeplace \u00 with the byte you want.

while true ; do printf "\u00" ; done | yourapp

C++ code:

#include<cstdio>

int main(){
 char out=Byte;
 while(true)
 fwrite(&out,sizeof(out),1,stdout);
}

Compile: reeplace Byte with the value you want.

g++ -O3 -o bin file.cpp -D Byte=0x01

Use

./bin | yourapp

edited Feb 06 '16 at 07:50

answered Feb 06 '16 at 07:33

ncomputers

1,524
1
11
23

How can I put a bit mask on /dev/zero so that I can get bytes other than zero?

8 Answers8

Read zeros, translate each zero to your pattern!

Quite fast

Linked

Related