4

I have a very simple html file with a value inside. Value is 57 in this case.

<eta version="1.0"><value uri="/user/var/48/10391/0/0/12528" strValue="57" unit="%" decPlaces="0" scaleFactor="10" advTextOffset="0">572</value></eta>

What is an easy bash script way to extract and write in a variable? Is there a way to not even require a wget into a file as an intermediate step, so as not require to open and use a file where it is stored, but directly work with the wget?

To clarify, I could do a simple wget, save to a file and check the file for the value or is there an even more enhanced way to do the wget somewhere in RAM and not require an explicit file to be stored?

Thanks a million times, highly appreciated Norbert

terdon
  • 242,166
njordan
  • 159

4 Answers4

12

You can extract a value in your example with grep and assign it to the variable in the following way

$ x=$(wget -0 - 'http://foo/bar.html' | grep -Po '<value.*strValue="\K[[:digit:]]*')
$ echo $x
57

Explanation:

  • $(): command substitution
  • grep -P: grep with Perl regexp enable
  • grep -o: grep shows only matched part of the line
  • \K: do not show in the output anything what was matched up to this point
  • wget -O -: prints downloaded document to standard output (not to file)

However, for general approach it is better to use dedicated parser for html code.

jimmij
  • 47,140
  • +1 but since you're using -P, why not use \d+ instead of [[:digit:]]*? – terdon Nov 12 '14 at 23:29
  • Nice explanations. No temp file would be nice. – geedoubleya Nov 12 '14 at 23:59
  • i know it is a stupid question, but can you show how it would look like if I do a wget to a website....so there is no need to intermediate have a local file stored? – njordan Nov 13 '14 at 12:47
  • @njordan See the update, you just need to use -O - option with wget as in terdon answer. The - means to use standard output for downloaded document, not a file. – jimmij Nov 13 '14 at 13:08
  • one more question, it seems that [[:digit:]]* does only extract a integer value.....I did use the same great line to extract another parameter....that is float (e.g., 15,4) and it cuts at 15....what do I have to do to take the complete string in the "" as a float variable? – njordan Nov 22 '14 at 23:09
  • Try grep -Po '<value.*strValue="\K[[:digit:]]*(,[[:digit:]]+){0,1}'. The {0,1} means that group inside () can be present only zero or one time. So 57,11,22 will match 57,11. – jimmij Nov 22 '14 at 23:38
  • Should my last question not also work directly this way: grep -Po '<value.strValue="\K[[:digit:]],[[:digit:]] Assuming that there has to be float value with one ","

    Also, isn't there a more simple way to just take out all found between the ""?

    – njordan Nov 25 '14 at 21:39
  • @njordan grep -Po '<value.*strValue="\K[[:digit:]]*,[[:digit:]] will fail if there is integer and in case of float it will match only first digit after ,. If that is what you are looking for that's fine. And yes, there is very easy way to take everything between "" with awk, in your case: awk -F'"' '{print $6}', however you must guarantee, that your value is exactly at 6th position. – jimmij Nov 25 '14 at 21:59
  • Sorry, I have another usecase now....what if I need to take a STRING....so everything between "". Thanks – njordan Dec 11 '14 at 21:59
  • @njordan as I've said in last comment with awk that would be awk -F'"' '{print $6}', just change 6 to string position. If you want grep then crucial regexp would be "[^"]*", but the whole command would depend on specific case. – jimmij Dec 11 '14 at 22:12