To use a proper XML parser (here I use xmlstarlet
) to extract the values of all Name
nodes in all XML files that have a .xml
filename suffix in or under /tmp
:
find /tmp -type f -name '*.xml' -exec xmlstarlet sel -t -v '//Name' -nl {} +
This does not require that the <Name>
opening tag and the corresponding </Name>
closing tag are on the same line, nor does it require that the Name
node has no attributes, like your grep
command does.
To output a bit more information with xmlstarlet
, like the filename that is currently being processed, and to only do that if the file actually has a Name
node, replace the xmlstarlet
invocation in the find
command above with
xmlstarlet sel -t -i '//Name' -o '### ' -f -o ':' -nl -v '//Name' -nl
This outputs the pathname of the XML file, prefixed by ###
and suffixed by :
and a newline, but only if the file contains a Name
node. After that comes the values of each Name
node in the document.
Using grep
:
grep
will always output the filename of the file containing the match if more than one file is given on the command line. If you only pass it one file, no filename will be printed.
To force always printing the filename along with the actual match, add /dev/null
as an extra file to grep in:
find /tmp -type f -name '*.xml' -exec grep -o -P '(?<=<Name>).*(?=</Name>)' {} /dev/null \;
Or, for potentially fewer invocations of grep
, use find -exec grep ... {} +
instead:
find /tmp -type f -name '*.xml' -exec grep -o -P '(?<=<Name>).*(?=</Name>)' /dev/null {} +
At least GNU grep
as well as grep
on OpenBSD and FreeBSD also support the -H
flag to always print the filename, even if only one file is given. Since you used grep -P
, you're probably using GNU grep
anyway.
-H
is an extension to the standardgrep
supported by GNUgrep
and a few other implementations. – Kusalananda Jan 18 '18 at 08:23