Extract paired lines

Question

I've got a txt file like this. There are paired lines email-password and email-hash.

EMAIL:user1@site.com
PASSWORD:pass1
EMAIL:user2@site.com
PASSWORD:pass2
EMAIL:user3@site.com
PASSWORD:pass3
EMAIL:user4@site.com
HASH:qwerty123
EMAIL:user5@site.com
HASH:somehash
EMAIL:user6@site.com
PASSWORD:pass6

I try to extract only email-password lines excluding email-hash lines. The command isn't working properly in my case sed -e 's/.*EMAIL://' -e 's/.*PASSWORD://' -e "/\b\HASH\b/d" test.txt

Expect the output:

user1@site.com
pass1
user2@site.com
pass2
user3@site.com
pass3
user6@site.com
pass6

Can a : exist in any of the fields? – Ed Morton Sep 01 '20 at 12:20 — Ed Morton, Sep 01 '20 at 12:20

Quasímodo · Answer 1 · 2020-09-01T14:56:24.903

sed -n 'N;s/^EMAIL://;s/PASSWORD://p' file

N append next line to pattern space,
s/^EMAIL:// substitute EMAIL: with nothing,
s/PASSWORD://p substitute PASSWORD: with nothing and only print if the substitution was successful.

Tested on sample input. Assumption: 1st line is EMAIL:, 2nd is PASSWORD: or HASH:, and repeat.

As a bonus, if there may be blank lines, it is better to use Awk,

awk -F ':' '/^PASSWORD:/{print line;print $2}/^EMAIL:/{line=$2}' file

glenn jackman · Answer 2 · 2020-09-01T11:33:37.827

4

Here's another couple of variations:

paste -d :  - - < myfile | awk -F: '$3 == "PASSWORD" {print $2; print $4}'

tac myfile | awk -F: '$1 == "PASSWORD" {print $2; getline; print $2}' | tac

edited Sep 01 '20 at 11:33

answered Aug 31 '20 at 22:58

glenn jackman

85,964

Rakesh Sharma · Answer 3 · 2020-09-01T06:20:36.243

pbm identification As you know sed is a line oriented stream editor, so when the decision to print or not to print lies in another line like in this your case, we need to orchestrate a state machine, and they will need flip flops or variables in this case.

Essentially we need to hold back printing till we see the correct state transition. Like in this case, when we transition from state (email line) -> state(passwd line) only.

Using GNU sed in extended regex mode -E, which makes reading the sed code easier and writing it less prone to backslashitis.

$ sed -Ee '
    /^PASSWORD:/!{h;d;}
    x;G;s/(^|\n)[^:]*:/\1/g
' test.txt

Basic idea is to save the line which isn't a password line in the hold register so that when we actually get to the password line we can use it.

Using GNU awk we essentially code the above sed functionality in awk with the awk variable e serving as the hold register.

$ awk -F: '
    /^PASSWORD:/&&
    ($0=e RS $2)"";{e=$2}
' test.txt

Using GNU grep we use the before option -B to list one line before the password line then remove the dash line generated by grep and assume nobody uses that as a password.

$ < test.txt \
  grep -B1 '^PASSWORD:' |
  grep -Fxve -- | cut -d: -f2-

perl can be used as shown when we pick the next line and then checks made.

$ perl -ne '
    /^EMAIL:/ && ($_ .= <>);
    /\nPASSWORD:/ && print(s/^[^:]+://mgr);
' tes.txt

bash builtins

while IFS=: read -r a p; do
  case $a in
    'PASSWORD') printf '%s\n' "$e" "$p" ;;
    *) e=$p ;;
  esac
done < test.txt

+1: so many ways ... I particularly like the bash built-in variation. — Cbhihe, Sep 01 '20 at 08:04
a portable way to get rid of lines starting with "--" in them could be : grep -v "^--" — Olivier Dulac, Sep 01 '20 at 08:42
Note that the bash builtins variant (which btw has nothing bash-specific in it) doesn't work properly if the email address or password ends in a : character (and contains no other : character). I can't help but to also refer to Why is using a shell loop to process text considered bad practice? here. — Stéphane Chazelas, Sep 01 '20 at 13:44

Kusalananda · Answer 4 · 2020-09-01T07:29:37.437

3

Using sed to pull out the lines that we're interested in, and then cut to remove the PASSWORD: and EMAIL: strings from the start of each line:

$ sed -n 'N; /\nPASSWORD:/p' file | cut -d : -f 2-
user1@site.com
pass1
user2@site.com
pass2
user3@site.com
pass3
user6@site.com
pass6

Using cut here helps keep the sed expression, and therefore the whole command, simple.

edited Sep 01 '20 at 07:29

answered Sep 01 '20 at 07:12

Kusalananda

333,661

Ed Morton · Answer 5 · 2020-09-01T16:42:07.430

If the data you want printed can't contain any :s:

$ awk -F':' '$1=="PASSWORD"{print prev ORS $2} {prev=$2}' file
user1@site.com
pass1
user2@site.com
pass2
user3@site.com
pass3
user6@site.com
pass6

or if it can then either of these would work, just your preference:

$ awk '{prev=curr; curr=$0; sub(/[^:]+:/,"",curr)} /^PASSWORD:/{print prev ORS curr}' file
user1@site.com
pass1
user2@site.com
pass2
user3@site.com
pass3
user6@site.com
pass6
$ awk -F':' '{tag=$1; sub(/[^:]+:/,"")} tag=="PASSWORD"{print prev ORS $0} {prev=$0}' file
user1@site.com
pass1
user2@site.com
pass2
user3@site.com
pass3
user6@site.com
pass6
$ awk -F':' '{tag=$1; sub(/[^:]+:/,""); prev=curr; curr=$0} tag=="PASSWORD"{print prev ORS curr}' file
user1@site.com
pass1
user2@site.com
pass2
user3@site.com
pass3
user6@site.com
pass6

Stéphane Chazelas · Answer 6 · 2020-09-01T12:42:30.830

2

If that file is always in that format with EMAIL at the start of every other line:

sed -n 'N;s/^EMAIL:\(.*\n\)PASSWORD:/\1/p'

should do it. Or, to be on the safe side, look for EMAIL: as the start of the record:

sed -n '/^EMAIL:/{N;s/^EMAIL:\(.*\n\)PASSWORD:/\1/p;}'

You could also use pcregrep with its multiline mode:

pcregrep -M -o1 -o2 --om-separator=$'\n' '^EMAIL:(.*)\nPASSWORD:(.*)'

edited Sep 01 '20 at 12:42

answered Sep 01 '20 at 12:35

Stéphane Chazelas

544,893

Praveen Kumar BS · Answer 7 · 2020-09-01T19:06:00.600

1

sed "N;s/\n/ /g" filename| awk -F ":" '/PASSWORD/{gsub("PASSWORD","",$2);print $2"\n"$3  }'


output

user1@site.com 
pass1
user2@site.com 
pass2
user3@site.com 
pass3
user6@site.com 
pass6

Python

#!/usr/bin/python
m=[]
k=open('filename','r')
for i in k:
    m.append(i.strip())
for j in range(0,len(m),2):
    if "PASSWORD" in m[j+1]:
        print m[j].split(":")[-1]
        print m[j+1].split(":")[-1]

edited Sep 01 '20 at 19:06

answered Sep 01 '20 at 18:29

Praveen Kumar BS

5,211

Extract paired lines

7 Answers7

Linked