10

I have a textfile that has the following format and I want to add a vertical line after those lines, followed by increasing numbers:

c4-1 d e c
c d e c
e-2 f g2
e4 f g2
g8-4\( a-5 g f\) e4 c
g'8\( a g f\) e4 c
c-1 r c2
c4 r c2 

I achieve the line and the numbering with the following while-loop:

#!/bin/bash

while read -r line; do
    if [ -z "$line" ]; then
        echo
        continue
    fi
    n=$((++n)) \
    && grep -vE "^$|^%" <<< "$line" \
    | sed 's/$/\ \|\ \%'$(("$n"))'/'
done < file

and get an output like:

c4-1 d e c | %1
c d e c | %2
e-2 f g2 | %3
e4 f g2 | %4
g8-4\( a-5 g f\) e4 c | %5
g'8\( a g f\) e4 c | %6
c-1 r c2 | %7
c4 r c2 | %8

now I want the addition to be vertically aligned and get an output like this:

c4-1 d e c            | %1
c d e c               | %2
e-2 f g2              | %3
e4 f g2               | %4
g8-4\( a-5 g f\) e4 c | %5
g'8\( a g f\) e4 c    | %6
c-1 r c2              | %7
c4 r c2               | %8

this would mean I need to somehow get the line length of the longest line (here: 21 characters) and the line length of each line and add the difference with spaces, how could I achieve this?

nath
  • 5,694

5 Answers5

11

You could print the lines without alignment and format the output with column -t and a dummy delimiter character:

#!/bin/bash

while read -r line; do
  if [ -z "$line" ]; then
    echo
    continue
  fi
  printf '%s@| %%%s\n' "$line" "$((++n))"
done < file | column -e -s'@' -t | sed 's/ |/|/'

Here, I added a @ as dummy character before the | indicating the end of the column. The sed command at the end is used to remove one additional space character before the |. Option -e is needed to keep empty lines in the output.

Output:

c4-1 d e c            | %1
c d e c               | %2
e-2 f g2              | %3
e4 f g2               | %4
g8-4\( a-5 g f\) e4 c | %5
g'8\( a g f\) e4 c    | %6
c-1 r c2              | %7
c4 r c2               | %8
Freddy
  • 25,565
  • nice one! Did not think of using column. Was about making it complicated by subtracting line length from the longest lines characters and repeating the spaces. Cheers! <3 – nath Nov 20 '19 at 04:22
  • Is the if statement defensive programming? – RonJohn Jun 11 '23 at 20:56
9

Using awk + GNU wc assuming all characters in the input are single-width:

$ awk -v f="$(wc -L < ip.txt)" '{printf "%-*s | %%%s\n", f, $0, NR}' ip.txt
c4-1 d e c            | %1
c d e c               | %2
e-2 f g2              | %3
e4 f g2               | %4
g8-4\( a-5 g f\) e4 c | %5
g'8\( a g f\) e4 c    | %6
c-1 r c2              | %7
c4 r c2               | %8
Sundeep
  • 12,008
5

Plain bash: works with bash version >= 4.0

#!/bin/bash
mapfile -t lines < file
max=0
for line in "${lines[@]}"; do
    max=$(( ${#line} > max ? ${#line} : max ))
done
for i in "${!lines[@]}"; do
    printf "%-*s | %%%d\n" $max "${lines[i]}" $((i+1))
done

For older bash versions, replace mapfile with a while-read loop: this works with version 3.2

#!/bin/bash
lines=()
max=0
while IFS= read -r line || [[ -n "line" ]]; do
    lines+=("$line")
    max=$(( ${#line} > max ? ${#line} : max ))
done < file
for i in "${!lines[@]}"; do
    printf "%-*s | %%%d\n" $max "${lines[i]}" $((i+1))
done
glenn jackman
  • 85,964
3

Assuming there are no @ characters in the data (just replace the two @ used here with another character in that case):

$ awk -v OFS='@| %' '{ print $0, FNR }' file | column -s '@' -t
c4-1 d e c             | %1
c d e c                | %2
e-2 f g2               | %3
e4 f g2                | %4
g8-4\( a-5 g f\) e4 c  | %5
g'8\( a g f\) e4 c     | %6
c-1 r c2               | %7
c4 r c2                | %8

This uses the string @| % as the output field separator and prints the input followed by the line number of each line (separated by this separator), then uses column to align this on the @ characters (these will be removed).


If you're fond of sed or of awkward regular expressions, you could always number the lines with cat -n or nl -b a and then move the line numbers to the end of the line and insert @| % using sed, before calling column:

$ cat -n file | sed -E 's/^[[:blank:]]*([[:digit:]]+)[[:blank:]]*(.*)$/\2@| \%\1/' | column -s '@' -t
c4-1 d e c             | %1
c d e c                | %2
e-2 f g2               | %3
e4 f g2                | %4
g8-4\( a-5 g f\) e4 c  | %5
g'8\( a g f\) e4 c     | %6
c-1 r c2               | %7
c4 r c2                | %8

Using awk to read you file twice, once to figure out the maximum line length (m) and again to format the lines to this length. column is not use here (or in the last solution):

$ awk 'FNR==NR { m=(length>m?length:m); next } { printf("%-*s | %%%d\n", m, $0, FNR) }' file file
c4-1 d e c            | %1
c d e c               | %2
e-2 f g2              | %3
e4 f g2               | %4
g8-4\( a-5 g f\) e4 c | %5
g'8\( a g f\) e4 c    | %6
c-1 r c2              | %7
c4 r c2               | %8

Notice that the filename is given twice on the command line.


Same as above, but storing the file in memory as an array (a), and printing it according to the longest line length at the end. Disk access is benefitted to the decrement of memory consumption:

$ awk '{ a[FNR]=$0; m=(length>m?length:m) } END { for (i=1; i<=FNR; ++i) printf("%-*s | %%%d\n", m, a[i], i) }' file
c4-1 d e c            | %1
c d e c               | %2
e-2 f g2              | %3
e4 f g2               | %4
g8-4\( a-5 g f\) e4 c | %5
g'8\( a g f\) e4 c    | %6
c-1 r c2              | %7
c4 r c2               | %8
Kusalananda
  • 333,661
2

just for the records: (this is horribly slow, but was my first attempt using wc -L)
definitely going for the answer of @Freddy using column!

#!/bin/bash

file="$1"

ll=$(wc -L < "$file")

while read -r line; do
    if [ -z "$line" ]; then
        echo
        continue
    fi
    sl=$(wc -L <<< "$line")
    if [ "$ll" = "$sl" ]; then
        as=$(echo "$ll - $sl" | bc)
    else
        as=$(echo "$ll - $sl + 1" | bc)
    fi
    space=$(printf '\ %.0s' $(seq "$as") )
    n=$((++n)) \
    && grep -vE "^$|^%" <<< "$line" \
    | sed "s/$/$space\ \|\ \%$(printf "%s" "$n")/"
done < "$file"

though it is working with one additional space:

c4-1 d e c             | %1
c d e c                | %2
e-2 f g2               | %3
e4 f g2                | %4
g8-4\( a-5 g f\) e4 c  | %5
g'8\( a g f\) e4 c     | %6
c-1 r c2               | %7
c4 r c2                | %8
nath
  • 5,694