Bash - draw a vertical line behind lines with variable length

Question

I have a textfile that has the following format and I want to add a vertical line after those lines, followed by increasing numbers:

c4-1 d e c
c d e c
e-2 f g2
e4 f g2
g8-4\( a-5 g f\) e4 c
g'8\( a g f\) e4 c
c-1 r c2
c4 r c2

I achieve the line and the numbering with the following while-loop:

#!/bin/bash

while read -r line; do
    if [ -z "$line" ]; then
        echo
        continue
    fi
    n=$((++n)) \
    && grep -vE "^$|^%" <<< "$line" \
    | sed 's/$/\ \|\ \%'$(("$n"))'/'
done < file

and get an output like:

c4-1 d e c | %1
c d e c | %2
e-2 f g2 | %3
e4 f g2 | %4
g8-4\( a-5 g f\) e4 c | %5
g'8\( a g f\) e4 c | %6
c-1 r c2 | %7
c4 r c2 | %8

now I want the addition to be vertically aligned and get an output like this:

c4-1 d e c            | %1
c d e c               | %2
e-2 f g2              | %3
e4 f g2               | %4
g8-4\( a-5 g f\) e4 c | %5
g'8\( a g f\) e4 c    | %6
c-1 r c2              | %7
c4 r c2               | %8

this would mean I need to somehow get the line length of the longest line (here: 21 characters) and the line length of each line and add the difference with spaces, how could I achieve this?

use can use wc -L to get length of longest line in a file.. and then use printf formatting — Sundeep, Nov 20 '19 at 04:01
@Sundeep cheers mate, figured it out using wc -L. Though this attempt is pretty slow. Freddies answer using column is pretty awesome, check it out :-)) — nath, Nov 20 '19 at 05:03
https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice would help you understand why it is slow — Sundeep, Nov 20 '19 at 05:36
@Sundeep, SE Comment Link Helper can help you make more readily readable links to other Q&As in SE. Like for your link: Why is using a shell loop to process text considered bad practice? — Stéphane Chazelas, Nov 20 '19 at 07:54

score 11 · Accepted Answer · answered Nov 20 '19 at 04:17

You could print the lines without alignment and format the output with column -t and a dummy delimiter character:

#!/bin/bash

while read -r line; do
  if [ -z "$line" ]; then
    echo
    continue
  fi
  printf '%s@| %%%s\n' "$line" "$((++n))"
done < file | column -e -s'@' -t | sed 's/ |/|/'

Here, I added a @ as dummy character before the | indicating the end of the column. The sed command at the end is used to remove one additional space character before the |. Option -e is needed to keep empty lines in the output.

Output:

c4-1 d e c            | %1
c d e c               | %2
e-2 f g2              | %3
e4 f g2               | %4
g8-4\( a-5 g f\) e4 c | %5
g'8\( a g f\) e4 c    | %6
c-1 r c2              | %7
c4 r c2               | %8

nice one! Did not think of using column. Was about making it complicated by subtracting line length from the longest lines characters and repeating the spaces. Cheers! <3 — nath, Nov 20 '19 at 04:22

score 9 · Answer 2 · edited Nov 20 '19 at 07:50

9

Using awk + GNU wc assuming all characters in the input are single-width:

$ awk -v f="$(wc -L < ip.txt)" '{printf "%-*s | %%%s\n", f, $0, NR}' ip.txt
c4-1 d e c            | %1
c d e c               | %2
e-2 f g2              | %3
e4 f g2               | %4
g8-4\( a-5 g f\) e4 c | %5
g'8\( a g f\) e4 c    | %6
c-1 r c2              | %7
c4 r c2               | %8

edited Nov 20 '19 at 07:50

Stéphane Chazelas

544,893

answered Nov 20 '19 at 05:33

Sundeep

12,008

You've got to read the file twice. Fine for most files; less good if huge. – RonJohn Jun 11 '23 at 20:58

glenn jackman · Answer 3 · 2019-11-20T15:48:31.957

Plain bash: works with bash version >= 4.0

#!/bin/bash
mapfile -t lines < file
max=0
for line in "${lines[@]}"; do
    max=$(( ${#line} > max ? ${#line} : max ))
done
for i in "${!lines[@]}"; do
    printf "%-*s | %%%d\n" $max "${lines[i]}" $((i+1))
done

For older bash versions, replace mapfile with a while-read loop: this works with version 3.2

#!/bin/bash
lines=()
max=0
while IFS= read -r line || [[ -n "line" ]]; do
    lines+=("$line")
    max=$(( ${#line} > max ? ${#line} : max ))
done < file
for i in "${!lines[@]}"; do
    printf "%-*s | %%%d\n" $max "${lines[i]}" $((i+1))
done

Kusalananda · Answer 4 · 2019-11-20T22:45:45.523

Assuming there are no @ characters in the data (just replace the two @ used here with another character in that case):

$ awk -v OFS='@| %' '{ print $0, FNR }' file | column -s '@' -t
c4-1 d e c             | %1
c d e c                | %2
e-2 f g2               | %3
e4 f g2                | %4
g8-4\( a-5 g f\) e4 c  | %5
g'8\( a g f\) e4 c     | %6
c-1 r c2               | %7
c4 r c2                | %8

This uses the string @| % as the output field separator and prints the input followed by the line number of each line (separated by this separator), then uses column to align this on the @ characters (these will be removed).

If you're fond of sed or of awkward regular expressions, you could always number the lines with cat -n or nl -b a and then move the line numbers to the end of the line and insert @| % using sed, before calling column:

$ cat -n file | sed -E 's/^[[:blank:]]*([[:digit:]]+)[[:blank:]]*(.*)$/\2@| \%\1/' | column -s '@' -t
c4-1 d e c             | %1
c d e c                | %2
e-2 f g2               | %3
e4 f g2                | %4
g8-4\( a-5 g f\) e4 c  | %5
g'8\( a g f\) e4 c     | %6
c-1 r c2               | %7
c4 r c2                | %8

Using awk to read you file twice, once to figure out the maximum line length (m) and again to format the lines to this length. column is not use here (or in the last solution):

$ awk 'FNR==NR { m=(length>m?length:m); next } { printf("%-*s | %%%d\n", m, $0, FNR) }' file file
c4-1 d e c            | %1
c d e c               | %2
e-2 f g2              | %3
e4 f g2               | %4
g8-4\( a-5 g f\) e4 c | %5
g'8\( a g f\) e4 c    | %6
c-1 r c2              | %7
c4 r c2               | %8

Notice that the filename is given twice on the command line.

Same as above, but storing the file in memory as an array (a), and printing it according to the longest line length at the end. Disk access is benefitted to the decrement of memory consumption:

$ awk '{ a[FNR]=$0; m=(length>m?length:m) } END { for (i=1; i<=FNR; ++i) printf("%-*s | %%%d\n", m, a[i], i) }' file
c4-1 d e c            | %1
c d e c               | %2
e-2 f g2              | %3
e4 f g2               | %4
g8-4\( a-5 g f\) e4 c | %5
g'8\( a g f\) e4 c    | %6
c-1 r c2              | %7
c4 r c2               | %8

THX, the awk ones are all damn fast - NICE! EDIT: well the sed one too! — nath, Nov 20 '19 at 21:26

nath · Answer 5 · 2019-11-20T05:07:27.507

just for the records: (this is horribly slow, but was my first attempt using wc -L)
definitely going for the answer of @Freddy using column!

#!/bin/bash

file="$1"

ll=$(wc -L < "$file")

while read -r line; do
    if [ -z "$line" ]; then
        echo
        continue
    fi
    sl=$(wc -L <<< "$line")
    if [ "$ll" = "$sl" ]; then
        as=$(echo "$ll - $sl" | bc)
    else
        as=$(echo "$ll - $sl + 1" | bc)
    fi
    space=$(printf '\ %.0s' $(seq "$as") )
    n=$((++n)) \
    && grep -vE "^$|^%" <<< "$line" \
    | sed "s/$/$space\ \|\ \%$(printf "%s" "$n")/"
done < "$file"

though it is working with one additional space:

c4-1 d e c             | %1
c d e c                | %2
e-2 f g2               | %3
e4 f g2                | %4
g8-4\( a-5 g f\) e4 c  | %5
g'8\( a g f\) e4 c     | %6
c-1 r c2               | %7
c4 r c2                | %8

Bash - draw a vertical line behind lines with variable length

5 Answers5