sed: Portable solution to match "any character but newline"

Question

I can match newline by \n:

echo "one
two" | sed 'N;s/\n/_/g'

In GNU sed, I can use [^\n] to match any character but newline:

echo "one
two" | sed 'N;s/[^\n]/_/g'

This is very handy, but it violates POSIX. Other sed versions correctly answer __n______

Same thing with tab character, but there I can work around by using an actual tab character, preceeded by ctrl-v. But this doesn't work for newline:

echo "one
two" | sed 'N;s/[^
]/_/g'

gives me unbalanced brackets.

Using [^[:cntrl:]] only works while there are no other control characters I want to match.

So what's the correct way to match any character but newline in POSIX sed?

score 2 · Answer 1 · answered May 01 '17 at 11:56

2

Maybe i have not understood your question correctly but i will take my chances to answer.

If you want to match everything except new line, a simple regex dot . does exactly this: match any char except new lines.

Let's try it with a non gnu sed:

$ cat file5
home
help
variables
compatibility

$ sed 's/./_/g' file5
____
____
_________
_____________

$ echo "one
two
three
four" |sed 's/./_/g'
___
___
_____
____

By the way, your very first sed example:

echo "one
two" | sed 'N;s/\n/_/g'

matches only the next new line, not every new line:

$ echo "one
> two
> three
> four" |sed 'N;s/\n/_/g'
one_two
three_four

answered May 01 '17 at 11:56

George Vasiliou

7,913

1

Sorry, maybe my example was simplified too much. As long as a text is processed line by line, it's useless to match newline or the complement set. But sometimes you use things like sed 'H;1h;$! d;... and then do operations on the whole text in one buffer. In those cases it may be useful to match anything but newline. How to do that elegantly? – Philippos May 01 '17 at 12:04
@Philippos Even if it is like that (would be good to include a "buffer" example in your question) are you sure that using regex .* or .+ will not work? – George Vasiliou May 01 '17 at 12:09
Very sure. . matches anything but NUL character, including newline. I'm preparing another question where this is needed with buffer example. – Philippos May 01 '17 at 12:25
@Philippos Ok, better to see the buffer example. By the way if . matches everything except null you could pre-process the file like tr '\n' '\0' or even you could make it like sed '......' <(tr '\n' '\0' <file) (just thoughts) – George Vasiliou May 01 '17 at 12:30
Here is the example where I need this. I'm afraid in posix sed it is not possible to use \0 either. – Philippos May 01 '17 at 14:43
Missing the N in the first examples, that doesn't create a newline in the pattern space. Also: it is the N that reads lines in pairs, not the regex. The regex will match all newlines. – Sep 23 '21 at 12:49

Kusalananda · Answer 2 · 2021-08-29T08:35:48.733

The POSIX specification for basic regular expressions do not allow \n to match a literal newline (my emphasis below):

The Shell and Utilities volume of POSIX.1-2017 specifies within the individual descriptions of those standard utilities employing regular expressions whether they permit matching of <newline> characters; if not stated otherwise, the use of literal <newline> characters or any escape sequence equivalent in either patterns or matched text produces undefined results.

Luckily, the specification for the sed utility contains the following text, which states otherwise:

The sed utility shall support the BREs described in XBD Basic Regular Expressions, with the following additions:

[...]

The escape sequence \n shall match a <newline> embedded in the pattern space. [...]

This allows sed to match a literal newline that has been embedded in the pattern space (from using e.g. N) using \n in a regular expression.

This leads me to believe that it would be fine to use [^\n] to match any single non-newline character. This is also what the sed implementations on GNU systems, OpenBSD, FreeBSD, and Plan 9, does.

Thank you for this interpretation. It still conflicts with the statement "special characters [including backslash] shall lose their special meaning within a bracket expression." Anyhow, I didn’t ask for posix, but for portability. Unfortunally, my idea of "portable" covers more than three implementations … preferably all modern ones. — Philippos, Aug 29 '21 at 08:28
The sed from heirloom (from AT&T) http://heirloom.sourceforge.net/tools.html doesn't allow a \n inside a [...]. — , Sep 23 '21 at 12:53

score 1 · Accepted Answer · answered May 01 '17 at 10:30

Actually there's a very neat way to handle this scenario in regular sed: interchange newline with some regular char, say, _ then do the [^_] and then flip back. I was wanting to post a solution to a problem that came up but was too lazy to post it but now let me put it here:

sed -e '
   /./!b

   :loop
      $q; N
   /\n$/bloop

   h

   /\ncreate table/!{
      s/\(.*\)\n.*/\1/p
      g;s/.*\(\n\)/\1/;D
   }

   g

   y/\n_/_\n/
      s/^[^_]*/test/
   y/\n_/_\n/

' input.data

Problem statement for the above solution.

This works, but I'd rather call this a nasty workaround than a neat solution. Ain't there a more elegant way? — Philippos, May 01 '17 at 11:01

score 1 · Answer 4 · answered Mar 19 '19 at 19:02

1

You may use [[:alnum:][:punct:][:blank:]] bracket expression:

echo "one
two" | sed 'N;s/[[:alnum:][:punct:][:blank:]]/_/g'

Outputs:

___
___

The [:alnum:] matches all alphanumeric chars, [:punct:] matches all punctuation and [:blank:] matches all horizontal whitespaces. All vertical whitespace is left out and does not get matched.

See the online sed demo.

answered Mar 19 '19 at 19:02

Wiktor Stribiżew

156

Thank you for the late answer. I think one should better use [[:print:]] instead, as in some locales there can be stuff like [:graph:], which is neither [:alnum:] nor [:punct:], but a problem remains if other control characters are allowed in the stream. Maybe [[:print:]^A-^K-^Z] with a literal TAB does the trick, but it's certainly less elegant than [^\n]. Thanks anyhow. – Philippos Apr 04 '19 at 12:40
@Philippos Ok, then maybe sed 'N;s/[^\x0A]/_/g' will work. I thought the task was related to vertical whitespace in general. – Wiktor Stribiżew Apr 04 '19 at 12:43

sed: Portable solution to match "any character but newline"

4 Answers4

Linked