3

I've got a txt file like this. There are paired lines email-password and email-hash.

EMAIL:user1@site.com
PASSWORD:pass1
EMAIL:user2@site.com
PASSWORD:pass2
EMAIL:user3@site.com
PASSWORD:pass3
EMAIL:user4@site.com
HASH:qwerty123
EMAIL:user5@site.com
HASH:somehash
EMAIL:user6@site.com
PASSWORD:pass6

I try to extract only email-password lines excluding email-hash lines. The command isn't working properly in my case sed -e 's/.*EMAIL://' -e 's/.*PASSWORD://' -e "/\b\HASH\b/d" test.txt

Expect the output:

user1@site.com
pass1
user2@site.com
pass2
user3@site.com
pass3
user6@site.com
pass6

7 Answers7

10
sed -n 'N;s/^EMAIL://;s/PASSWORD://p' file
  • N append next line to pattern space,
  • s/^EMAIL:// substitute EMAIL: with nothing,
  • s/PASSWORD://p substitute PASSWORD: with nothing and only print if the substitution was successful.

Tested on sample input. Assumption: 1st line is EMAIL:, 2nd is PASSWORD: or HASH:, and repeat.


As a bonus, if there may be blank lines, it is better to use Awk,

awk -F ':' '/^PASSWORD:/{print line;print $2}/^EMAIL:/{line=$2}' file
Quasímodo
  • 18,865
  • 4
  • 36
  • 73
4

Here's another couple of variations:

paste -d :  - - < myfile | awk -F: '$3 == "PASSWORD" {print $2; print $4}'
tac myfile | awk -F: '$1 == "PASSWORD" {print $2; getline; print $2}' | tac
glenn jackman
  • 85,964
4

pbm identification As you know sed is a line oriented stream editor, so when the decision to print or not to print lies in another line like in this your case, we need to orchestrate a state machine, and they will need flip flops or variables in this case.

Essentially we need to hold back printing till we see the correct state transition. Like in this case, when we transition from state (email line) -> state(passwd line) only.

Using GNU sed in extended regex mode -E, which makes reading the sed code easier and writing it less prone to backslashitis.

$ sed -Ee '
    /^PASSWORD:/!{h;d;}
    x;G;s/(^|\n)[^:]*:/\1/g
' test.txt

Basic idea is to save the line which isn't a password line in the hold register so that when we actually get to the password line we can use it.

Using GNU awk we essentially code the above sed functionality in awk with the awk variable e serving as the hold register.

$ awk -F: '
    /^PASSWORD:/&&
    ($0=e RS $2)"";{e=$2}
' test.txt

Using GNU grep we use the before option -B to list one line before the password line then remove the dash line generated by grep and assume nobody uses that as a password.

$ < test.txt \
  grep -B1 '^PASSWORD:' |
  grep -Fxve -- | cut -d: -f2-

perl can be used as shown when we pick the next line and then checks made.

$ perl -ne '
    /^EMAIL:/ && ($_ .= <>);
    /\nPASSWORD:/ && print(s/^[^:]+://mgr);
' tes.txt

bash builtins

while IFS=: read -r a p; do
  case $a in
    'PASSWORD') printf '%s\n' "$e" "$p" ;;
    *) e=$p ;;
  esac
done < test.txt
3

Using sed to pull out the lines that we're interested in, and then cut to remove the PASSWORD: and EMAIL: strings from the start of each line:

$ sed -n 'N; /\nPASSWORD:/p' file | cut -d : -f 2-
user1@site.com
pass1
user2@site.com
pass2
user3@site.com
pass3
user6@site.com
pass6

Using cut here helps keep the sed expression, and therefore the whole command, simple.

Kusalananda
  • 333,661
2

If the data you want printed can't contain any :s:

$ awk -F':' '$1=="PASSWORD"{print prev ORS $2} {prev=$2}' file
user1@site.com
pass1
user2@site.com
pass2
user3@site.com
pass3
user6@site.com
pass6

or if it can then either of these would work, just your preference:

$ awk '{prev=curr; curr=$0; sub(/[^:]+:/,"",curr)} /^PASSWORD:/{print prev ORS curr}' file
user1@site.com
pass1
user2@site.com
pass2
user3@site.com
pass3
user6@site.com
pass6

$ awk -F':' '{tag=$1; sub(/[^:]+:/,"")} tag=="PASSWORD"{print prev ORS $0} {prev=$0}' file user1@site.com pass1 user2@site.com pass2 user3@site.com pass3 user6@site.com pass6

$ awk -F':' '{tag=$1; sub(/[^:]+:/,""); prev=curr; curr=$0} tag=="PASSWORD"{print prev ORS curr}' file user1@site.com pass1 user2@site.com pass2 user3@site.com pass3 user6@site.com pass6

Ed Morton
  • 31,617
2

If that file is always in that format with EMAIL at the start of every other line:

sed -n 'N;s/^EMAIL:\(.*\n\)PASSWORD:/\1/p'

should do it. Or, to be on the safe side, look for EMAIL: as the start of the record:

sed -n '/^EMAIL:/{N;s/^EMAIL:\(.*\n\)PASSWORD:/\1/p;}'

You could also use pcregrep with its multiline mode:

pcregrep -M -o1 -o2 --om-separator=$'\n' '^EMAIL:(.*)\nPASSWORD:(.*)'
1
sed "N;s/\n/ /g" filename| awk -F ":" '/PASSWORD/{gsub("PASSWORD","",$2);print $2"\n"$3  }'


output

user1@site.com 
pass1
user2@site.com 
pass2
user3@site.com 
pass3
user6@site.com 
pass6

Python

#!/usr/bin/python
m=[]
k=open('filename','r')
for i in k:
    m.append(i.strip())

for j in range(0,len(m),2): if "PASSWORD" in m[j+1]: print m[j].split(":")[-1] print m[j+1].split(":")[-1]