16

I was solving a challenge where I found a data file with no file extension. The file command shows that it is a data file (application/octet-stream). The hd command shows GNP. in the last line. So if I reverse this file then I will get the .PNG format file, I searched everywhere but I didn't find a solution explaining how to reverse the content of a binary file.

Solomon Ucko
  • 147
  • 1
  • 9
Prvt_Yadav
  • 5,882

5 Answers5

20

With xxd (from vim) and tac (from GNU coreutils, also tail -r on some systems):

< file.gnp xxd -p -c1 | tac | xxd -p -r > file.png
5

In zsh (the only shell that can internally deal with binary data (unless you want to consider ksh93's base64 encoding approach)):

zmodload zsh/mapfile
(LC_ALL=C; printf %s ${(s::Oa)mapfile[file.gnp]} > file.png)
  • LC_ALL=C: characters are bytes
  • $mapfile[file.gnp]: content of file.gnp file
  • s::: split the string into its byte constituents
  • Oa: reverse Order on array subscript that array
3

With perl:

perl -0777pe '$_=reverse $_'  [input_file]

Performance test:

dd if=/dev/urandom of=/tmp/a bs=1M count=1
LC_ALL=C tac -rs $'.\\|\n' /tmp/a > /tmp/r

time perl -0777pe '$_=reverse $_' /tmp/a         | diff -q - /tmp/r
time xxd -p -c1 /tmp/a | tac | xxd -p -r         | diff -q - /tmp/r
time perl -0777 -F -ape '$_=reverse@F' /tmp/a    | diff -q - /tmp/r
time LC_ALL=C tac -rs $'.\\|\n' /tmp/a           | diff -q - /tmp/r

Result:

  • Tested locally: my solution is the fastest, perl -0777 -F is the slowest.
  • Tested on Try it online!: my solution is the fastest, xxd is the slowest.

Note: the time diff runs should be the same for all solutions, as the output should be the same.

  • 1
    I've deleted my perl one. I hadn't realised at the time reverse could reverse strings as well, so doing that splitting didn't make much sense and your version is much much better. – Stéphane Chazelas Sep 22 '19 at 17:37
2

Here is one way of reversing a binary file using ksh93. I have left the code "loose" to make it easier to understand.

#!/bin/ksh93

typeset -b byte

redirect 3< image.gpj || exit 1

eof=$(3<#((EOF)))

read -r -u 3 -N 1 byte
printf "%B" byte > image.jpg
3<#((CUR - 1))

while (( $(3<#) > 0 ))
do
    read -r -u 3 -N 1 byte
    printf "%B" byte >> image.jpg
    3<#((CUR - 2))
done

read -r -u 3 -N 1 byte
printf "%B" byte >> image.jpg

redirect 3<&- || echo 'cannot close FD 3'

exit 0
fpmurphy
  • 4,636
  • nice. That's the only answer so far that doesn't involve storing the whole file in memory. However, it's terribly inefficient in that it makes several system calls for each byte of the file (and conversions to/from base64), so wouldn't be suitable for files that don't fit in memory either. On my machine, it processes files at about 10KB/s – Stéphane Chazelas Jan 12 '18 at 15:51
  • Note that the first read above should read nothing as it's done at the end of the file. – Stéphane Chazelas Jan 12 '18 at 15:58
  • Trying to understand why it was so slow, I tried running it under strace and ksh93 seems to be behaving very weirdly, where it seeks all over the place within the file and reads large amounts at the time. Maybe a variant of https://github.com/att/ast/issues/15 – Stéphane Chazelas Jan 12 '18 at 16:00
  • @StéphaneChazelas. No mystery as to why it is relatively slow. Within the loop it has to seek backwards each time it reads a byte. This can easily be significantly reduced by a factor of 20 or even more by reading and writing more than one byte at a time. The write side of things can similarly be optimized. Lots of other techniques are available to further speed things up. I will leave that exercise up to you. – fpmurphy Jan 13 '18 at 05:49
  • Try strace on the script to see what I mean. ksh93 reads the files thousands of times over. For instance, before reading the first byte, it seeks 64KiB off the end of the file, reads 64KiB, then seeks before the last byte and reads 1 byte and does something similar for every byte. Note that what you can do with those base64 encoded strings is limited, so if you read more than one byte at a time, it's going to be more difficult to extract the individual bytes of that. – Stéphane Chazelas Jan 13 '18 at 09:23
1

I tried the following:

tac -rs '.' input.gnp > output.png

The idea is to force 'tac' using any character as separator. I tried that on a binary file and it seemed to work but any confirmation would be appreciated.

Main advantage is that it does not load file into memory.

RFen
  • 11