0

the following useful find command , print the value from Name tag

find /tmp -type f -name '*.xml' -exec grep -o -P '(?<=<Name>).*(?=</Name>)' {} \;

the problem is in case we have couple xml files under /tmp then we never know which xml file have the Name tag

or in other words this find syntax will print the value of Name

but without the xml file name

please advice how to print the file name when grep matched the:

(?<=<Name>).*(?=</Name>)
yael
  • 13,106

4 Answers4

4

To use a proper XML parser (here I use xmlstarlet) to extract the values of all Name nodes in all XML files that have a .xml filename suffix in or under /tmp:

find /tmp -type f -name '*.xml' -exec xmlstarlet sel -t -v '//Name' -nl {} + 

This does not require that the <Name> opening tag and the corresponding </Name> closing tag are on the same line, nor does it require that the Name node has no attributes, like your grep command does.

To output a bit more information with xmlstarlet, like the filename that is currently being processed, and to only do that if the file actually has a Name node, replace the xmlstarlet invocation in the find command above with

xmlstarlet sel -t -i '//Name' -o '### ' -f -o ':' -nl -v '//Name' -nl

This outputs the pathname of the XML file, prefixed by ### and suffixed by : and a newline, but only if the file contains a Name node. After that comes the values of each Name node in the document.


Using grep:

grep will always output the filename of the file containing the match if more than one file is given on the command line. If you only pass it one file, no filename will be printed.

To force always printing the filename along with the actual match, add /dev/null as an extra file to grep in:

find /tmp -type f -name '*.xml' -exec grep -o -P '(?<=<Name>).*(?=</Name>)' {} /dev/null \;

Or, for potentially fewer invocations of grep, use find -exec grep ... {} + instead:

find /tmp -type f -name '*.xml' -exec grep -o -P '(?<=<Name>).*(?=</Name>)' /dev/null {} +

At least GNU grep as well as grep on OpenBSD and FreeBSD also support the -H flag to always print the filename, even if only one file is given. Since you used grep -P, you're probably using GNU grep anyway.

Kusalananda
  • 333,661
3

You simply give greps "-H" parameter, then the filename will always be printed, even if there is only one file to grep from (as in your case).

1

Note, grep is not the right tool for parsing xml/html files(documents) and won't give a robust and solid solution. Use a "proper" xml/html parsers like xmlstarlet:

find /tmp -type f -name '*.xml' -exec xmlstarlet sel -t -m "//Name" -f -n {} \;
  • xmlstarlet sel -t -m "//Name" -f -n - will print input file name (ensured by option -f) only if input xml document matches (-m) XPATH expression "//Name"
  • xmlstarlet not installed on my machine can you change it to xmllint – yael Jan 18 '18 at 08:23
  • I guess option to do the same approach with xmlinit cant be done? – yael Jan 18 '18 at 08:51
  • @yael, xmllint is not able to print its input filename as the only output (only within a debug-trace). It's doable and I have such solution but it requires a subshell invocation, just a little bit longer than the above approach. So if you need it - I have it – RomanPerekhrest Jan 18 '18 at 08:54
0

If you don't mind that the filename is printed after the found lines, there's always finds "-print" option:

find /tmp -type f -name '*.xml' -exec grep -o -P '(?<=<Name>).*(?=</Name>)' {} \; -print