8

How can I make browse-url follow URLs with line breaks?

Example from RFC 1738:

Yes, Jim, I found it under <URL:ftp://info.cern.ch/pub/www/doc;
type=d> but you can probably pick it up from <URL:ftp://ds.in
ternic.net/rfc>.  Note the warning in <URL:http://ds.internic.
net/instructions/overview.html#WARNING>.

Example from RFC 3986, which updates 1738:

Yes, Jim, I found it under "http://www.w3.org/Addressing/",
but you can probably pick it up from <ftp://foo.example.
com/rfc/>.  Note the warning in <http://www.ics.uci.edu/pub/
ietf/uri/historical.html#WARNING>.

Update

I just played around a bit. For parsing URLs prefixed with URL:, there is a regexp stored in thing-at-point-markedup-url-regexp. By default, it excludes line breaks. I changed it to:

"<URL:\\([^<>]+\\)>"

Furthermore, the function thing-at-point--bounds-of-markedup-url, which parses these strings, also excludes line breaks. This can be fixed by changing:

(and (re-search-forward thing-at-point-markedup-url-regexp
                        end 1)

to:

(and (re-search-forward thing-at-point-markedup-url-regexp
                        nil 1)

Those are just some first steps that I note down here for reference. A lot more work would need to be done to properly detect and clean up URLs with white space. It's not trivial, and maybe that's the reason why it's currently not supported. I may be one of few who regularly line break URLs in text documents.

feklee
  • 1,029
  • 5
  • 17
  • 2
    `M-x report-emacs-bug` – phils Apr 13 '15 at 22:28
  • I found the proper function (in `thingatpoint.el`) to fix this. I can submit the bug report with a patch if needed. – nanny Apr 14 '15 at 15:38
  • 1
    Ah, interesting. `thing-at-point-markedup-url-regexp` says "This kind of markup was formerly recommended as a way to indicate URIs, but as of RFC 3986 it is no longer recommended." so it might be intentional that `browse-url` doesn't recognise it. Probably still worth querying, but I would check the archives to see if there has already been discussion of the issue. – phils Apr 15 '15 at 02:09
  • 1
    `browse-url` actually handles that syntax fine in 24.3, but it doesn't do so in 24.5; so if you can't find an explanation for the change, definitely report it as a bug. – phils Apr 15 '15 at 02:42
  • 5
    @phils [I think I found the commit](http://git.savannah.gnu.org/cgit/emacs.git/commit/?id=6e5c1569e941d385d28466a337ece0322bfa93e7). The commit message just says "Disallow newlines." Not sure why this change was made. – nanny Apr 15 '15 at 13:46
  • 3
    @phils According to [RFC 3986](https://www.ietf.org/rfc/rfc3986.txt) the `URL:` prefix is indeed no longer recommended. The possible use of line-breaks, however, is still mentioned: *"In some cases, extra whitespace (spaces, line-breaks, tabs, etc.) may have to be added to break a long URI across lines. The whitespace should be ignored when the URI is extracted."* – feklee Apr 15 '15 at 14:34

0 Answers0