Using xq
(part of yq
, a jq
-like parser collection for YAML, XML, and TOML, from https://kislyuk.github.io/yq/), because xmlstarlet
is too strict about your missing namespace declaration (see end of question for an xmlstarlet
solution anyway).
xq -r --arg title "OYSTER" --arg class "NAME" '
(.. | select(."@d:title"? == $title)) |
(.. | select(."@class"? == $class))."#text"' file.xml
This recursively selects any document node that has a d:title
attribute (the initial @
used in the expression denotes a node's attribute rather than a node's name) that has the value OYSTER
.
Given these nodes (only one in the example), they are searched recursively for any node that has a class
attribute with value NAME
.
For each such node, the node's value is extracted.
The two strings OYSTER
and NAME
are tied to internal variables on the command line with the --arg
option.
The output, given the document in the question:
GUYBRUSH THREEPWOOD
If other nodes than d:entry
can have a d:title
attribute, and/or other nodes than span
can have a class
attribute and you want to avoid matching these attributes in the wrong type of node, then make sure that you only look in the appropriate nodes:
xq -r --arg title "OYSTER" --arg class "NAME" '
(.. | ."d:entry"? | select(."@d:title"? == $title)) |
(.. | .span?[]? | select(."@class"? == $class))."#text"' file.xml
As a reference, since xq
is actually calling jq
with a JSON document behind the scenes, the following is the JSON document that your XML document is translated into:
{
"root": {
"d:entry": {
"@d:title": "OYSTER",
"span": [
{
"@class": "foot",
"span": {
"@role": "text",
"#text": "foo"
}
},
{
"@class": "sg",
"span": {
"@id": "004",
"span": [
{
"@role": "text",
"span": {
"@class": "pos",
"span": {
"@class": "baz",
"#text": "tart"
},
"d:pos": null
}
},
{
"@id": "005",
"@class": "star",
"span": [
{
"@class": "NAME",
"d:def": null,
"#text": "GUYBRUSH THREEPWOOD"
},
{
"@role": "text",
"@class": "bar",
"#text": ":"
},
{
"@role": "text",
"@class": "grog",
"span": [
{
"@class": "ex",
"#text": "pirate"
},
{
"@class": "parrot",
"#text": "."
}
]
}
]
}
]
}
}
]
}
}
}
Assuming the document has a proper declaration of the d
namespace, xmlstarlet
may be used to extract the wanted text like so:
xmlstarlet sel -t \
-m '//d:entry[@d:title = "OYSTER"]' \
-v '//span[@class = "NAME"]' -nl file.xml
Or, with internal variables set on the command line with --var
(note the inclusion of the quotation in the values),
xmlstarlet sel -t --var title='"OYSTER"' --var class='"NAME"' \
-m '//d:entry[@d:title = $title]' \
-v '//span[@class = $class]' -nl file
Both of these start with matching any d:entry
node whose d:title
attribute is OYSTER
. For each such matching node, it recursively looks for span
nodes with a class
attribute with value NAME
. The value of each such node is outputted.
d
, that you haven't declared. – Kusalananda May 16 '21 at 16:59