2

I am looking for a way to extract strings from a file using bash, and append them to another file. The file in question contains data with the following format:

Data="/dataset/0001" a bunch of random stuff I don't need Data="/dataset/0002" more random stuff Data="/dataset/0003"

et cetera.

I am looking to extract and return the strings between the double quotes (ie, /dataset/0001, /dataset/0002, /dataset/0003, etc.).

Any suggestions on how to go about doing this?

As a follow up question, it would be super neat to be able to prepend a constant string (for example, /home/user) before each returned value (ie, /home/user/dataset/0001, /home/user/dataset/0002, /home/user/dataset/0003, etc.).

Thanks for any suggestions on this.

Kusalananda
  • 333,661

4 Answers4

2
$ grep -o 'Data="[^"]*"' file | sed 's,Data=",/home/user,; s/"$//'
/home/user/dataset/0001
/home/user/dataset/0002
/home/user/dataset/0003

This uses a combination of grep -o and sed to do the extraction and transformation of the data.

The grep -o pulls out each Data="..." bit onto separate lines, while the sed takes each of these lines and first replaces Data=" with /home/user and then deletes the " at the end.

Kusalananda
  • 333,661
  • Thanks Kusalananda, that worked great. I use grep from time to time, but it's all of those "extra" characters that I find the hardest to figure out.

    For instance, I can see what is happening with the wildcard (*) before the double-quote (include everything up to and including the final double-quote), but what is happening inside of the square brackets [^"] ?

    Thanks again, I really appreciate this (and everyone else who contributed).

    – jon danniken Dec 22 '19 at 01:03
0

With Perl:

$ perl -lnE 'say for map { "/home/user" . $_ } /Data="(.*?)"/g' file
/home/user/dataset/0001
/home/user/dataset/0002
/home/user/dataset/0003
steeldriver
  • 81,074
0

I wouldn't recommend doing it this way, because

but just for the sake of illustration, using repeated application of the bash =~ operator:

#!/bin/bash

pfx="/home/user"

re='Data="([^"]*)"'

while read -r line; do 
  while [[ $line =~ $re ]]; do
    printf '%s%s\n' "$pfx" "${BASH_REMATCH[1]}"
    line="${line#*${BASH_REMATCH[0]}}"
  done
done < file
steeldriver
  • 81,074
0

Here are some of the methods you may use to get the output:

perl -lsne '
  () = /Data="(.*?)"(?{print "$v$1"})/g;
' -- -v="/home/user" file 

grep -oP 'Data="\K[^"]+(?=")' file |\
xargs printf '/home/user%s\n'

sed -nEe '
  s|Data="([^"]+)"|\n/home/user\1\n|
  s/.*\n(.*\n)/\1/
  P;D
' file 

/home/user/dataset/0001
/home/user/dataset/0002
/home/user/dataset/0003