i have a small amounts of text in html that i want to process.
currently they are rendered with shr-render-region
so that all html is out of the way, then copied and processed.
this works fine except for the fact that the rendering inserts newlines according to the value of shr-width
, and these newlines can't be removed with replace-regexp-in-string
or any other function that i have tried. (C-u C-x =
reports that they are Line Feed (C-j
) newlines, but matching with \n
fails.)
is it possible to avoid inserting these when rendering with shr? or is there are way to strip them that i'm missing? perhaps i can cleanly extract the text some other way?
ideally paragraph breaks in the text (single blank lines) would be preserved, but no other newlines would interfere.
the text is currently variable pitch, i.e. shr-use-fonts
is non-nil. but i have also tried setting it to nil and the newlines are still inserted.
EDIT:
an example of what i'm working with (it's posts from mastodon, i'm processing them in https://codeberg.org/martianh/mastodon.el):
<p>Thrilled to have coauthored the 1st version of the guidelines for conducting research on the <a href=\"https://mastodon.xyz/tags/Linux\" class=\"mention hashtag\" rel=\"nofollow noopener noreferrer\" target=\"_blank\">#<span>Linux</span></a> kernel: <a href=\"https://github.com/torvalds/linux/commit/f09f6f9b69821c9efcf16e6b5b466ce9e263ca51\" rel=\"nofollow noopener noreferrer\" target=\"_blank\"><span class=\"invisible\">https://</span><span class=\"ellipsis\">github.com/torvalds/linux/comm</span><span class=\"invisible\">it/f09f6f9b69821c9efcf16e6b5b466ce9e263ca51</span></a> This is in the wake of the UMN incident and will hopefully help fellow sw.eng. scholars to enforce <a href=\"https://mastodon.xyz/tags/ethics\" class=\"mention hashtag\" rel=\"nofollow noopener noreferrer\" target=\"_blank\">#<span>ethics</span></a> when studying the <a href=\"https://mastodon.xyz/tags/kernel\" class=\"mention hashtag\" rel=\"nofollow noopener noreferrer\" target=\"_blank\">#<span>kernel</span></a> community.</p>