Drawing a histogram from a bash command output

Question

I have the following output:

2015/1/7    8
2015/1/8    49
2015/1/9    40
2015/1/10   337
2015/1/11   11
2015/1/12   3
2015/1/13   9
2015/1/14   102
2015/1/15   62
2015/1/16   10
2015/1/17   30
2015/1/18   30
2015/1/19   1
2015/1/20   3
2015/1/21   23
2015/1/22   12
2015/1/24   6
2015/1/25   3
2015/1/27   2
2015/1/28   16
2015/1/29   1
2015/2/1    12
2015/2/2    2
2015/2/3    1
2015/2/4    10
2015/2/5    13
2015/2/6    2
2015/2/9    2
2015/2/10   25
2015/2/11   1
2015/2/12   6
2015/2/13   12
2015/2/14   2
2015/2/16   8
2015/2/17   8
2015/2/20   1
2015/2/23   1
2015/2/27   1
2015/3/2    3
2015/3/3    2

And I'd like to draw a histogram

2015/1/7  ===
2015/1/8  ===========
2015/1/9  ==========
2015/1/10 ====================================================================
2015/1/11 ===
2015/1/11 =
...

Do you know if there is a bash command that would let me do that?

That is indeed one of the risks of providing links instead of self-contained answers. If the deleted SO answer is useful, please post it as an answer here. — Jeff Schaller, May 21 '19 at 14:59

score 18 · Answer 1 · edited May 23 '17 at 12:40

18

In perl:

perl -pe 's/ (\d+)$/"="x$1/e' file

e causes the expression to be evaluated, so I get = repeated using the value of $1 (the number matched by (\d+)).
You could do "="x($1\/3) instead of "="x$1 to get shorter lines. (The / is escaped since we're in the middle of a substitution command.)

In bash (inspired from this SO answer):

while read d n 
do 
    printf "%s\t%${n}s\n" "$d" = | tr ' ' '=' 
done < test.txt

printf pads the second string using spaces to get a width of $n (%${n}s), and I replace the spaces with =.
The columns are delimited using a tab (\t), but you can make it prettier by piping to column -ts'\t'.
You could use $((n/3)) instead of ${n} to get shorter lines.

Another version:

unset IFS; printf "%s\t%*s\n" $(sed 's/$/ =/' test.txt) | tr ' ' =

The only drawback I can see is that you'll need to pipe sed's output to something if you want to scale down, otherwise this is the cleanest option. If there is a chance of your input file containing one of [?* you should lead the command w/ set -f;.

edited May 23 '17 at 12:40

Community

1

answered Jan 06 '15 at 16:58

muru

72,889

3

Bravo for showing a shell solution too. Your Perl solution is very clean as well. – chicks Jan 06 '15 at 17:38
@mikeserv Wonderful! I always forget %*s even though it was the first printf-related trick I learnt in C programming. – muru Jan 07 '15 at 02:22
The printf(sed) | tr version doesn't works here as far as I can tell. – Natim Jan 07 '15 at 09:03
@Natim here being where? – muru Jan 07 '15 at 10:42
@mikeserv limitations in argument length perhaps? – muru Jan 07 '15 at 11:55
Here being on my computer running bash 4.3.11. I've got this output: https://www.irccloud.com/pastebin/Vfb8KsNA – Natim Jan 07 '15 at 14:39
@Natim do you have an empty leading line? – muru Jan 07 '15 at 14:44
@mikeserv sed can read from filenames given as arguments. O.o – muru Jan 07 '15 at 18:18
@Natim - this whole convo finally makes sense to me - i didnt see the pastebin before. I missed a lot around this question, i guess. Anyway, im pretty sure your problem was ...%$*s - you dont want that $expansion token - just %*s. – mikeserv Jan 12 '15 at 07:46
@mikeserv I didn't notice that typo! – muru Jan 12 '15 at 07:48
Hmm, the solution in this only uses 'sort' and 'uniq': http://stackoverflow.com/questions/6044539/generating-histogram-from-file – qneill Aug 31 '16 at 22:11
@qneill so? It isn't even remotely graphical. – muru Sep 01 '16 at 05:04

Gilles Quénot · Accepted Answer · 2020-05-17T04:28:06.793

14

Try this in perl :

perl -lane 'print $F[0], "\t", "=" x ($F[1] / 5)' file

EXPLANATIONS:

-a is an explicit split() in @F array, we get the values with $F[n]
x is to tell perl to print a character N times
($F[1] / 5) : here we get the number and divide it by 5 for a pretty print output (simple arithmetic)

edited May 17 '20 at 04:28

answered Jan 06 '15 at 16:54

Gilles Quénot

33,867

1

perl -lane 'print $F[0], "\t", $F[1], "\t", "=" x ($F[1] / 3 + 1)' It looks really great :) thanks – Natim Jan 07 '15 at 09:05

iruvar · Answer 3 · 2015-01-07T14:10:34.150

Easy with awk

awk '{$2=sprintf("%-*s", $2, ""); gsub(" ", "=", $2); printf("%-10s%s\n", $1, $2)}' file

2015/1/7 ========
2015/1/8 =================================================
2015/1/9 ========================================
..
..

Or with my favourite programming language

python3 -c 'import sys
for line in sys.stdin:
  data, width = line.split()
  print("{:<10}{:=<{width}}".format(data, "", width=width))' <file

score 9 · Answer 4 · answered Jan 08 '15 at 14:47

How about:

#! /bin/bash
histo="======================================================================+"

read datewd value

while [ -n "$datewd" ] ; do
   # Use a default width of 70 for the histogram
   echo -n "$datewd      "
   echo ${histo:0:$value}

   read datewd value
done

Which produces:

~/bash $./histogram.sh < histdata.txt
2015/1/7    ========
2015/1/8    =================================================
2015/1/9    ========================================
2015/1/10   ======================================================================+
2015/1/11   ===========
2015/1/12   ===
2015/1/13   =========
2015/1/14   ======================================================================+
2015/1/15   ==============================================================
2015/1/16   ==========
2015/1/17   ==============================
2015/1/18   ==============================
2015/1/19   =
2015/1/20   ===
2015/1/21   =======================
2015/1/22   ============
2015/1/24   ======
2015/1/25   ===
2015/1/27   ==
2015/1/28   ================
2015/1/29   =
2015/2/1    ============
2015/2/2    ==
2015/2/3    =
2015/2/4    ==========
2015/2/5    =============
2015/2/6    ==
2015/2/9    ==
2015/2/10   =========================
2015/2/11   =
2015/2/12   ======
2015/2/13   ============
2015/2/14   ==
2015/2/16   ========
2015/2/17   ========
2015/2/20   =
2015/2/23   =
2015/2/27   =
2015/3/2    ===
2015/3/3    ==
~/bash $

score 9 · Answer 5 · answered May 16 '20 at 19:12

9

You could do something like that with the bar verb in Miller

$ mlr --nidx --repifs --ofs tab bar -f 2 file
2015/1/7    ***.....................................
2015/1/8    *******************.....................
2015/1/9    ****************........................
2015/1/10   ***************************************#
2015/1/11   ****....................................
2015/1/12   *.......................................
.
.
.

answered May 16 '20 at 19:12

steeldriver

81,074

Miller is new to me. It is very cool! – JJoao Sep 19 '20 at 09:34
2

Could you please explain a little more what does each parameter do? Even reading the reference is not clear to me – Pablo A Sep 24 '20 at 01:48
Didn't know about Miller until today. Thank you! – dimitarvp Jan 30 '21 at 20:57

score 8 · Answer 6 · answered Jun 03 '20 at 14:19

8

(this is not exactly what you ask, but) With Gnuplot, if you are in X, try:

gnuplot -p -e 'set sty d hist;set xtic rot; plot "file" u 2:xtic(1)'

answered Jun 03 '20 at 14:19

JJoao

12,170
1
23
45

this is the answer i was looking for. Thank you ! I would have appreciated to know if it can read stdin. and a quick word about data representation. – mh-cbon Aug 18 '21 at 13:38
2

@mh-cbon, gnuplot can do manyyyy diferent things (see their docs and demos). Example with stdin seq 30 | gnuplot -p -e 'set style d hist; plot "-" u ($1**3). You can also use `"/dev/stdin"' – JJoao Aug 18 '21 at 17:22
oh thank you ! I am pretty aware it is very powerful, but i admit i have not yet found whatsneeded to learn it. A good resource, a topic to work with, motivation. For now, i have resorted to using a plotter written written with the Go language https://golangdocs.com/plotting-in-golang-histogram-barplot-boxplot But i definitely going to try your command. – mh-cbon Aug 18 '21 at 18:06
there is something! But the scaling is bad. let me write a question ^^ – mh-cbon Aug 18 '21 at 18:12
https://unix.stackexchange.com/questions/665243/improve-plot-scale-and-display if you want to take a look – mh-cbon Aug 18 '21 at 18:17
Related: https://stackoverflow.com/questions/2471884/histogram-using-gnuplot – Ciro Santilli OurBigBook.com Jul 22 '23 at 18:23

score 2 · Answer 7 · 2015-01-06T19:20:59.293

This struck me as a fun traditional command line problem. Here's my bash script solution:

awk '{if (count[$1]){count[$1] += $2} else {count[$1] = $2}} \
        END{for (year in count) {print year, count[year];}}' data |
sed -e 's/\// /g' | sort -k1,1n -k2,2n -k3,3n |
awk '{printf("%d/%d/%d\t", $1,$2,$3); for (i=0;i<$4;++i) {printf("=")}; printf("\n");}'

The little script above assumes the data is in a file imaginatively named "data".

I'm not too happy with the "run it through sed and sort" line - it would be unnecessary if your month and day-of-month always had 2 digits, but that's life.

Also, as a historical note, traditional Unixes used to come with a command line plotting utility that could do fairly ugly ASCII graphs and plots. I can't remember the name, but it looks like GNU plotutils replace the old traditional utility.

@muru - seems to work either way. However, I did find a typo in the "else" clause. Thanks. — , Jan 06 '15 at 19:19

therealneil · Answer 8 · 2018-06-06T16:51:59.697

Try this:

while read value count; do
    printf '%s:\t%s\n' "${value}" "$(printf "%${count}s" | tr ' ' '=')"
done <path/to/my-output

The only tricky part is the construction of the bar. I do it here by delegating to printf and tr like this SO answer.

As a bonus, it's POSIX-sh-compliant.

References:

score 1 · Answer 9 · answered Jan 07 '15 at 01:05

1

Nice exercise here. I dumped the data in a file called "data" because I am very imaginative.

Well, you asked for it in bash... here it is in pure bash.

cat data | while read date i; do printf "%-10s " $date; for x in $(seq 1 $i); do echo -n "="; done; echo; done

awk is a better option.

awk '{ s=" ";while ($2-->0) s=s"=";printf "%-10s %s\n",$1,s }' data

answered Jan 07 '15 at 01:05

Falsenames

715

Can you pipe the data through awk instead of using a file? – Natim Jan 07 '15 at 08:52
Yes, it's the same thing either way. Just add a "cat data |" at the beginning like I had for the bash bits, or a "<data" at the end. Or you can even just have the awk part without a file specified, paste in the data and hit ctrl-D at the end. Specifying the file just treats that file as stdin, and I didn't want to keep copying and pasting the datafile because I'm lazy. – Falsenames Jan 07 '15 at 16:47
1

Actually, I just reread the question while linking this to a coworker... you said you had "output", not a data file. So you can just run whatever is creating that report, then pipe it to awk, and you're done. Pipes just direct output of the last command as the source of input for the next command. – Falsenames Jan 07 '15 at 17:06

Drawing a histogram from a bash command output

9 Answers9

EXPLANATIONS:

Linked

Related