how to capture string from csv line that comes after specific word

Question

for example , this is the csv line that we want to cut the strings that comes after /data/

status=true /data/sdb/hadoop/hdfs/log,/data/sdc/hadoop/hdfs/log,/data/sdd/hadoop/hdfs/log,/data/sde/hadoop/hdfs/log,/data/sdf/hadoop/hdfs/log

example of expected resuls

sdb
sdc
sdd
sde
sdf

Just for completeness: is the status=true part not separated by a ,? — AdminBee, Mar 03 '20 at 11:29
What should the output be if /data/sdb/foo/data/bar existed in the CSV? What if a field was /foo/bar/data/ (i.e. nothing after /data/)? — Ed Morton, Mar 03 '20 at 15:57

score 4 · Accepted Answer · answered Mar 03 '20 at 11:30

4

Use grep:

with PCRE:

grep -Po '/data/\K[^/]*'

if that is not available:

grep -o '/data/[^/]*' | cut -d'/' -f3

answered Mar 03 '20 at 11:30

pLumo

22,565

score 1 · Answer 2 · answered Mar 03 '20 at 11:55

1

@pLumo absolutely has the right answer. If, for whatever reason, you wanted to use awk and bash's builtin parameter expansion, all the while being slightly convoluted...

LINE_COUNTER=0
while read line; do
    COUNT_SEP="${line//[^,]}"
    for col in $(seq 2 $((${#COUNT_SEP}+1))); do
        LINE_COUNTER=$(($LINE_COUNTER+1))
        COLUMN=$(echo "${line}" | awk -v variable="${col}" -F, '{ print $variable }')
        if [ $LINE_COUNTER -eq 1 ]
        then
            echo "${COLUMN}" > /tmp/splitCSV
        else
            echo "${COLUMN}" >> /tmp/splitCSV
        fi
    done
    while read splitCol; do
        echo "${splitCol}" | awk -F'/data/' '{ print $2 }' | awk -F'/' '{ print $1 }'
    done < /tmp/splitCSV
done < test.csv

answered Mar 03 '20 at 11:55

Jake Ireland

215

1

You should never do that. See why-is-using-a-shell-loop-to-process-text-considered-bad-practice for some of the reasons. – Ed Morton Mar 03 '20 at 15:59
1

Thanks! I didn't know that was best practice. Very interesting. – Jake Ireland Mar 03 '20 at 18:51
1

Yeah, the guys who invented shell to manipulate files and processes also invented tools like awk for shell to call to manipulate text. So, horses for courses... never write a shell loop just to manipulate text and you can't go wrong. – Ed Morton Mar 04 '20 at 14:41

score 1 · Answer 3 · answered Mar 03 '20 at 12:07

1

Just to add an option, having in mind that there's only one pattern that match three characters between slashes, with sed and grep:

grep -o "/.../"  foo | sed 's;/;;g' file

Output:

sdb
sdc
sdd
sde
sdf

answered Mar 03 '20 at 12:07

schrodingerscatcuriosity

12,396

score 1 · Answer 4 · answered Mar 03 '20 at 13:02

1

For Above input below command will work

perl -pne "s/,/\n/g"  filename|awk -F '/data/' '{gsub("/.*","",$2);print $2}'

output

sdb
sdc
sdd
sde
sdf

answered Mar 03 '20 at 13:02

Praveen Kumar BS

5,211

score 1 · Answer 5 · answered Mar 03 '20 at 14:22

1

This works for me with awk

awk -F'/' '{for(i=1;i<=NF;i++) if($i=="data") print $(i+1)}' <file>

1: -F defines field separator as /

2: loop on every field on each line

3: if field equals "data" print next field

answered Mar 03 '20 at 14:22

Clement

57

score 1 · Answer 6 · answered Mar 03 '20 at 17:06

We can choose from the following :

awk -F/ '
     BEGIN { OFS = RS }
     {
       N = split($0, a, /\//)
       $0 = "" 
        for ( i=j=1; i<N; i++ ) 
            if ( a[i] == "data" ) 
                 $(j++) = a[++i]
      }N>1' file.csv


perl -F/ -lane '
   shift(@F) eq q(data) and print(shift(@F)) 
      while(@F && m{/data/});
' file.csv


perl -lne 'print for m{/data/([^/,]+)}g' file.csv


sed -re '
    /\n/{P;D;}
    s:/data/([^/,]+):\n\1\n:
   D
' file.csv

how to capture string from csv line that comes after specific word

6 Answers6