1

I have the following file (note that the ======== are actually present in the file):

start ======== id: 5713
start ======== id: 5911
start ======== id: 5911
end ========= id: 5911
start ======== id: 6111
end ========= id: 5713
start ======== id: 31117

I want to remove any two lines that have the same id and have respectively start and end in them.

Based on the above example, the output will be:

start ======== id: 5911
start ======== id: 6111
start ======== id: 31117

How to do this with bash, awk, sed ... ?

AdminBee
  • 22,803
MOHAMED
  • 301

2 Answers2

5

Using any awk in any shell on every Unix box this will print as many unpaired start and/or end statements as exist in your input:

$ cat tst.awk
$1 == "start" { beg[$NF] = $0; delta =  1 }
$1 == "end"   { end[$NF] = $0; delta = -1 }
{ cnt[$NF] += delta }
END {
    for ( key in cnt ) {
        for (i=1; i<=cnt[key]; i++) {
            print beg[key]
        }
        for (i=-1; i>=cnt[key]; i--) {
            print end[key]
        }
    }
}

$ awk -f tst.awk file
start ======== id: 5911
start ======== id: 6111
start ======== id: 31117

To better demonstrate using more comprehensive sample input:

$ cat file
start ======== id: 5713
start ======== id: 5911
start ======== id: 5911
start ======== id: 5911
end ========= id: 5911
start ======== id: 6111
end ========= id: 5713
end ========= id: 5713
start ======== id: 31117

$ awk -f tst.awk file
end ========= id: 5713
start ======== id: 5911
start ======== id: 5911
start ======== id: 6111
start ======== id: 31117
Ed Morton
  • 31,617
  • 1
    +1: nice and well analyzed; you essentially print all the hits that contribute to cnt[key] not being zero. Adding that word of explanation might go a long way to explain the algorithmic approach you chose. Also replacing the END block with: END{for ( key in cnt ) {mult=cnt[key]>=0?1:-1; for (i=mult; mult*i<=mult*cnt[key]; i+=mult) {print cnt[key]>=0?beg[key]:end[key]} }} collapses the 2 inner for loops into 1. It works but unfortunately it's not really readable. – Cbhihe Sep 22 '21 at 07:18
  • 1
    Anyway. It's great solution. – K-attila- Sep 22 '21 at 13:24
0

Just sed and nl and sort :

nl  <filename> -s ":"|sort -t ":" -k 3 -k 2 | sed  -n ":x s/\n[0-9 ]*$//;/end[^\n]*$/{N;bx};s/\(.*\)[ 0-9]*:end .*id:\( [0-9]*\).*\n.*start.*id:\2[^0-9]*$/\1/;tx;s/\n$//;/start/{P;D};/^[ 0-9]*:end[^\n]*/{s/\n[0-9:]*$/$/;N;bx};/start/P;/end/P;" | sort -n| sed "s/[ 0-9]*://"

nl tt -s ":"|sort -t ":" -k 3 -k 2 | sed -n ":x s/\n[0-9 ]$//;/end[^\n]$/{N;bx};s/(.)[ 0-9]:end .id:( [0-9]).\n.start.id:\2[^0-9]$/\1/;tx;s/\n$//;/start/{P;D};/^[ 0-9]:end[^\n]/{s/\n[0-9:]$/$/;N;bx};/start/P;/end/P;" | sort -n| sed "s/[ 0-9]://" start ======== id: 5713 start ======== id: 5713 start ======== id: 5713 start ======== id: 5713 start ======== id: 5911 start ======== id: 6111 end ======== id: 31117

If the order is not important (and if every end has the start line):

sort <filename> -t ":" -k 2|sed -e '/end/{N;d;}

start ======== id: 31117 start ======== id: 5911 start ======== id: 6111

This is better (need to repair, but working):

sort <filename> -t ":" -k 2 | sed  -n ":x ;/end[^\n]*$/{N;bx};s/\(.*\)end .*id:\( [0-9]*\).*start.*id:\2[^0-9]*$/\1/;tx;s/\n$//;/start/{P;D};/^end[^\n]*/{s/\n$/$/;N;bx};/start/P;/end/P"

cat tt start ======== id: 5713 start ======== id: 5713 start ======== id: 5713 start ======== id: 5713 start ======== id: 5713 start ======== id: 5713 start ======== id: 5713 dggdgtfZZ start ======== id: 5713 start ======== id: 5713 start ======== id: 5911 start ======== id: 5911 end ========= id: 5911 start ======== id: 6111 end ========= id: 5713 end ========= id: 5713 end ========= id: 5713 end ========= id: 5713 end ========= id: 5713 start ======== id: 31117 end ======== id: 31117 end ======== id: 31117

sort -t ":" -k 2 tt| sed -n ":x ;/end[^\n]$/{N;bx};s/(.)end .id:( [0-9]).start.id:\2[^0-9]$/\1/;tx;s/\n$//;/start/{P;D};/^end[^\n]/{s/\n$/$/;N;bx};/start/P;/end/P" end ======== id: 31117 start ======== id: 5713 start ======== id: 5713 start ======== id: 5713 start ======== id: 5713 start ======== id: 5911 start ======== id: 6111

K-attila-
  • 642