5

I can match newline by \n:

echo "one
two" | sed 'N;s/\n/_/g'

In GNU sed, I can use [^\n] to match any character but newline:

echo "one
two" | sed 'N;s/[^\n]/_/g'

This is very handy, but it violates POSIX. Other sed versions correctly answer __n______

Same thing with tab character, but there I can work around by using an actual tab character, preceeded by ctrl-v. But this doesn't work for newline:

echo "one
two" | sed 'N;s/[^
]/_/g'

gives me unbalanced brackets.

Using [^[:cntrl:]] only works while there are no other control characters I want to match.

So what's the correct way to match any character but newline in POSIX sed?

Philippos
  • 13,453

4 Answers4

2

Maybe i have not understood your question correctly but i will take my chances to answer.

If you want to match everything except new line, a simple regex dot . does exactly this: match any char except new lines.

Let's try it with a non gnu sed:

$ cat file5
home
help
variables
compatibility

$ sed 's/./_/g' file5
____
____
_________
_____________

$ echo "one
two
three
four" |sed 's/./_/g'
___
___
_____
____

By the way, your very first sed example:

echo "one
two" | sed 'N;s/\n/_/g'

matches only the next new line, not every new line:

$ echo "one
> two
> three
> four" |sed 'N;s/\n/_/g'
one_two
three_four
  • 1
    Sorry, maybe my example was simplified too much. As long as a text is processed line by line, it's useless to match newline or the complement set. But sometimes you use things like sed 'H;1h;$! d;... and then do operations on the whole text in one buffer. In those cases it may be useful to match anything but newline. How to do that elegantly? – Philippos May 01 '17 at 12:04
  • @Philippos Even if it is like that (would be good to include a "buffer" example in your question) are you sure that using regex .* or .+ will not work? – George Vasiliou May 01 '17 at 12:09
  • Very sure. . matches anything but NUL character, including newline. I'm preparing another question where this is needed with buffer example. – Philippos May 01 '17 at 12:25
  • @Philippos Ok, better to see the buffer example. By the way if . matches everything except null you could pre-process the file like tr '\n' '\0' or even you could make it like sed '......' <(tr '\n' '\0' <file) (just thoughts) – George Vasiliou May 01 '17 at 12:30
  • Here is the example where I need this. I'm afraid in posix sed it is not possible to use \0 either. – Philippos May 01 '17 at 14:43
  • Missing the N in the first examples, that doesn't create a newline in the pattern space. Also: it is the N that reads lines in pairs, not the regex. The regex will match all newlines. –  Sep 23 '21 at 12:49
2

The POSIX specification for basic regular expressions do not allow \n to match a literal newline (my emphasis below):

The Shell and Utilities volume of POSIX.1-2017 specifies within the individual descriptions of those standard utilities employing regular expressions whether they permit matching of <newline> characters; if not stated otherwise, the use of literal <newline> characters or any escape sequence equivalent in either patterns or matched text produces undefined results.

Luckily, the specification for the sed utility contains the following text, which states otherwise:

The sed utility shall support the BREs described in XBD Basic Regular Expressions, with the following additions:

[...]

  • The escape sequence \n shall match a <newline> embedded in the pattern space. [...]

This allows sed to match a literal newline that has been embedded in the pattern space (from using e.g. N) using \n in a regular expression.

This leads me to believe that it would be fine to use [^\n] to match any single non-newline character. This is also what the sed implementations on GNU systems, OpenBSD, FreeBSD, and Plan 9, does.

Kusalananda
  • 333,661
  • Thank you for this interpretation. It still conflicts with the statement "special characters [including backslash] shall lose their special meaning within a bracket expression." Anyhow, I didn’t ask for posix, but for portability. Unfortunally, my idea of "portable" covers more than three implementations … preferably all modern ones. – Philippos Aug 29 '21 at 08:28
  • The sed from heirloom (from AT&T) http://heirloom.sourceforge.net/tools.html doesn't allow a \n inside a [...]. –  Sep 23 '21 at 12:53
1

Actually there's a very neat way to handle this scenario in regular sed: interchange newline with some regular char, say, _ then do the [^_] and then flip back. I was wanting to post a solution to a problem that came up but was too lazy to post it but now let me put it here:

sed -e '
   /./!b

   :loop
      $q; N
   /\n$/bloop

   h

   /\ncreate table/!{
      s/\(.*\)\n.*/\1/p
      g;s/.*\(\n\)/\1/;D
   }

   g

   y/\n_/_\n/
      s/^[^_]*/test/
   y/\n_/_\n/

' input.data

Problem statement for the above solution.

  • 1
    This works, but I'd rather call this a nasty workaround than a neat solution. Ain't there a more elegant way? – Philippos May 01 '17 at 11:01
1

You may use [[:alnum:][:punct:][:blank:]] bracket expression:

echo "one
two" | sed 'N;s/[[:alnum:][:punct:][:blank:]]/_/g'

Outputs:

___
___

The [:alnum:] matches all alphanumeric chars, [:punct:] matches all punctuation and [:blank:] matches all horizontal whitespaces. All vertical whitespace is left out and does not get matched.

See the online sed demo.

  • Thank you for the late answer. I think one should better use [[:print:]] instead, as in some locales there can be stuff like [:graph:], which is neither [:alnum:] nor [:punct:], but a problem remains if other control characters are allowed in the stream. Maybe [[:print:]^A-^K-^Z] with a literal TAB does the trick, but it's certainly less elegant than [^\n]. Thanks anyhow. – Philippos Apr 04 '19 at 12:40
  • @Philippos Ok, then maybe sed 'N;s/[^\x0A]/_/g' will work. I thought the task was related to vertical whitespace in general. – Wiktor Stribiżew Apr 04 '19 at 12:43