3

How do I get xmllint to output to a file?

thufir@dur:~/xmllint$ 
thufir@dur:~/xmllint$ xmllint --html http://www.skynet.be/nieuws-sport/weer/mijn-weer?cityId=6450  --xpath '//div[@class = "tides"]' - 2>/dev/null
<div class="tides">
            <div class="weather-sprite icon  st_nl" title="Marées Oostende"></div>
            <p>Hoogtij: <strong>10:28</strong>  <strong>23:11</strong></p>
            <p>Laagtij: <strong>04:44</strong>  <strong>17:13</strong></p>
            <div class="weather-sprite icon  anv_nl clearFlt" title="Marées Anvers"></div>
            <p>Hoogtij: <strong>00:41</strong>  <strong>13:06</strong></p>
            <p>Laagtij: <strong>07:11</strong>  <strong>07:11</strong></p>
        </div><div class="tides">
            <div class="weather-sprite icon  st_nl" title="Marées Oostende"></div>
            <p>Hoogtij: <strong>11:31</strong>  <strong></strong></p>
            <p>Laagtij: <strong>05:48</strong>  <strong>18:10</strong></p>
            <div class="weather-sprite icon  anv_nl clearFlt" title="Marées Anvers"></div>
            <p>Hoogtij: <strong>01:42</strong>  <strong>14:02</strong></p>
            <p>Laagtij: <strong>08:20</strong>  <strong>08:20</strong></p>
        </div>
^C
thufir@dur:~/xmllint$ 

As it hangs, it has to be killed. The fine manual:

   --output FILE
       Define a file path where xmllint will save the result of parsing.
       Usually the programs build a tree and save it on stdout, with this
       option the result XML instance will be saved onto a file.

but can't get that working. I don't need any output to the console at all, only interested in the file creation. This is to tidy up the html for processing by saxon.

Thufir
  • 1,870
  • 7
  • 34
  • 62
  • 1
    Hi, it's not what you want, but it could be useful. The tool I use is pup and the command is `curl "http://www.skynet.be/nieuws-sport/weer/mijn-weer?cityId=6450" | pup 'div.tides' >out.html' – aborruso Jan 05 '19 at 18:29

1 Answers1

3

I think when using the --html option in xmllint, some other options are ignored, such as --format and --output. (Tried it with libxml2 v2.9.4 that came with macOS High Sierra, and with v2.9.10 from Homebrew.)

Instead, to write xmllint's output to a file, you can redirect its standard output stream, by using the > ("greater-than") redirection operator.

Syntax

xmllint --html input.html > output.html

Example

xmllint --html --xpath "//p" http://example.com > output.html 2>/dev/null

Options/arguments:

  • --html — parse the input as HTML.
  • --xpath "//p" — XPath query selecting all <p> tags from the input.
  • http://example.com — input file, in this case downloaded directly from the specified URL.
  • > output.html — redirect standard output stream (stdout) to the specified file.
  • 2>/dev/null — optional: suppress standard error stream (stderr) from the terminal by redirecting it to the null device (/dev/null).

(See this answer for a nice cheat sheet of output/error redirection.)

HTTPS

Note that xmllint does not appear to support HTTPS at this time (as mentioned in this question). Instead, you could use another utility like curl or wget to download the file first, then pipe it to xmllint's standard input, by using the | ("pipe"/"vertical bar") control operator, and - ("hyphen/minus") for xmllint's file argument.

curl --silent "https://example.com" | xmllint --html --xpath "//p" - > output.html 2>/dev/null

Options/arguments:

  • --silent or -s — suppress curl progress/error messages (which might otherwise get processed by xmllint's parser).
  • "https://example.com" — input file that curl will download (over HTTPS, in this case) and pass to xmllint. (Use quotes if the URL contains & or other special characters.)
  • | — pipe the prior command's standard output (curl) to the next command's standard input (xmllint).
  • --html — parse xmllint's input as HTML.
  • --xpath "//p" — XPath query selecting all <p> tags from the input.
  • -xmllint gets its input from the standard input stream (stdin) (i.e. from curl's output) instead of from a file or URL.
  • > output.html — redirect xmllint's standard output stream (stdout) to the specified file.
  • 2>/dev/null — optional: suppress xmllint's standard error stream (stderr) from the terminal by redirecting it to the null device (/dev/null).

(See this answer for a nice list of control/redirection operators.)