How to extract lines from source file with reference file, add to the result file?

Question

I have a question and cannot figure out. It is Solaris. the over simplified source & reference files.

s.txt source file:

dn: cn=task,cn=Groups,dc=domain 
changetype: modify 
add: uniquemember 
uniquemember: cn=user1,cn=users,dc=domain

dn: cn=task,cn=Groups,dc=domain 
changetype: modify 
add: uniquemember 
uniquemember: cn=user9,cn=users,dc=domain

dn: cn=task,cn=Groups,dc=domain 
changetype: modify 
add: uniquemember 
uniquemember: cn=user10,cn=users,dc=domain

r.txt reference file:

uniquemember: cn=user9,cn=users,dc=domain 
uniquemember: cn=user8,cn=users,dc=domain

I'd want a script for using the reference record uniquemember file to extract the source file's cn=user9's line and 3 lines Above, add to a result file usermember_

add.ldif:

dn: cn=task,cn=Groups,dc=domain 
changetype: modify 
add: uniquemember 
uniquemember: cn=user9,cn=users,dc=domain

Note that the proposed duplicate depends on non-standard GNU extensions to the POSIX grep utility and isn't applicable to non-GNU systems like Solaris. — Andrew Henle, Sep 01 '18 at 11:54

steeldriver · Accepted Answer · 2018-09-01T14:08:56.240

1

If you are trying to use the values in r.txt as keys to extract matching multi-line records from s.txt, then try

awk 'NR==FNR {u[$2]++; next} $NF in u' r.txt RS= s.txt

process r.txt with the default (newline) record separator, constructing associative array u with keys from the second whitespace-separated field; then
unset the record separator RS= to switch to paragraph mode for the second file
process s.txt in paragraph mode i.e. treating each blank-line-separated block as a single record, whose last field value $NF may then be used as a lookup value in u
if $NF exists in u, print the whole record

To add space between the matched records:

If you have GNU awk (gawk) you can use the special variable RT to add back the original paragraph separators:

gawk 'NR==FNR {u[$2]++; next} $NF in u {print $0 RT}' r.txt RS= s.txt

More generally, you can append a single additional newline after every matched record:

awk 'NR==FNR {u[$2]++; next} $NF in u {print $0 "\n"}' r.txt RS= s.txt

or add an extra newline to the default output field separator:

awk 'NR==FNR {u[$2]++; next} $NF in u' r.txt RS= ORS='\n\n' s.txt

edited Sep 01 '18 at 14:08

answered Aug 31 '18 at 23:57

steeldriver

81,074

1

awk 'NR==FNR {u[$2]++; next} $NF in u' r.txt RS= s.txt awk: syntax error near line 1 awk: bailing out near line 1 , it is Solaris, How can it be modified for Solaris? thanks. – SeanB Sep 01 '18 at 04:00
@SeanB in solaris try using nawk (either /usr/bin/nawk or /usr/xpg4/bin/awk) – Archemar Sep 01 '18 at 09:07
@steeldriver for new to scripting , it looks like magic. 2 questions: 1) what is space in RS= s.txt for? without it , it does not work. 2) how to modify it with such that there would be space in the result for multiple matches? – SeanB Sep 01 '18 at 13:54
@SeanB please see updated answer – steeldriver Sep 01 '18 at 14:09
@steeldriver thanks a lot detailed explanations! one question, how to make gawk 'NR==FNR {u[$2]++; next} $NF in u {print $0 RT}' r.txt RS= s.txt for just one space, instead of two. the others are good with one space – SeanB Sep 01 '18 at 14:53
@SeanB if you use gawk's RT then you will get however many blank lines follow the particular record in s.txt whether that's one, several - or none (in the case of a record at the end of the file, with no trailing separator) – steeldriver Sep 01 '18 at 15:31

score 0 · Answer 2 · 2018-09-01T00:22:49.560

I understood from your question is that you would like to use a code to read key word user in r.txt, then search for this keyword in s.txt. Finally, print the line relevant to this keyword (and the previous three lines) in s.txt . You can write these lines in a file called "code":

#!/bin/bash
if  [[ `egrep user9 r.txt` ]] ; then 
grep -B 3 user9 s.txt
fi

Give this file "code" permissions, then run it in terminal as follows:

code > add.ldif

The output is:

dn: cn=task,cn=Groups,dc=domain 
changetype: modify 
add: uniquemember 
uniquemember: cn=user9,cn=users,dc=domain

Let's say you have two entries for user9 in s.txt as follows:

dn: cn=task,cn=Groups,dc=domain 
changetype: modify 
add: uniquemember 
uniquemember: cn=user1,cn=users,dc=domain

dn: cn=task,cn=Groups,dc=domain 
changetype: modify 
add: uniquemember 
uniquemember: cn=user9,cn=users,dc=domain

dn: cn=task,cn=Groups,dc=domain 
changetype: modify 
add: uniquemember 
uniquemember: cn=user18,cn=users,dc=domain

dn: cn=XXX,cn=XXX,dc=XXX 
changetype: XXX 
add: XXX 
uniquemember: cn=user9,cn=users,dc=domain

The previous code would find two entries for user9:

dn: cn=task,cn=Groups,dc=domain 
changetype: modify 
add: uniquemember 
uniquemember: cn=user9,cn=users,dc=domain
--
dn: cn=XXX,cn=XXX,dc=XXX 
changetype: XXX 
add: XXX 
uniquemember: cn=user9,cn=users,dc=domain

grep: illegal option -- B it is Solaris. what is the option B in solaris? thanks. — SeanB, Sep 01 '18 at 03:59
It is not clear what exactly you want. I belive in solaris you could try to replace 'grep or egrep' by 'sed'. Something like 'sed -n '1,/user/p' s.txt | tail -3' this will find the line that contain the word "user" and the previous /3/ lines. Adjust as you want. — , Sep 01 '18 at 08:17

How to extract lines from source file with reference file, add to the result file?

2 Answers2