2

How can I remove all punctuation from a file using sed, with the exception of certain characters? Specifically, I want to keep these characters:

@-_$%

I am currently using this to remove all punctuation, but I am not sure how to modify it to keep those characters:

cat input.txt | sed -e "s/[[:punct:]]\+//g" > output.txt

Alternatively, how can I remove only certain punctuation? Like:

.!?,'/\"()[]^*
jay
  • 123

3 Answers3

1

To remove only the characters:

.!?,'/\"()[]^*

Use a character class like so:

[][.!?,'/\\"()^*]

Note that the ] character must be first. Also, the ^ cannot be first since that would mean something entirely different. And the backslash is escaped.

Now, to actually use this character class, you have to get it to Sed. One way to do that is to put

s/[][.!?,'/\\"()^*]\+//g

in a file, and call it with sed -f scriptfile input.txt.

Another (trickier) way is to use shell quoting:

sed -e 's/[][.!?,'\''/\\"()^*]\+//g' input.txt

For the other part of your question, there is no way to match all characters in a character class except certain listed characters.

You can, however, match all NON-punctuation characters like so:

[^[:punct:]]
Wildcard
  • 36,499
1

sed approach:

Sample file contents:

.!?,'/\"()[]^* @-$%
.!?,'/\"()[]^* @ sdfsd %
as,,d//asd a?sd %%   --@_ _asdasdad$
sdfsdf %''%!% 2 + 2 = (?)

sed '/[[:punct:]]*/{ s/[^[:alnum:][:space:]@_$%-]//g}' file

The output:

 @-$%
 @ sdfsd %
asdasd asd %%   --@_ _asdasdad$
sdfsdf %%% 2  2  
1

You can do that very easily using perl6:

perl6 -pe 's:g/<:punct-[-@_%]>+//' file
  • <:punct-[-@_%]> will match any punctuation character, except -@_%.
  • :g is the global switch (like s/foo/bar/g in perl5 or sed)

To be allow comparison between answers (and also because I'm lazy), I'll reuse @RomanPerekhrest's sample input:

.!?,'/\"()[]^* @-$%
.!?,'/\"()[]^* @ sdfsd %
as,,d//asd a?sd %%   --@_ _asdasdad$
sdfsdf %''%!% 2 + 2 = (?)

So this line:

perl6 -pe 's:g/<:punct-[-@_%]+[^]>+//' file

Gives:

^ @-$%
^ @ sdfsd %
asdasd asd %%   --@_ _asdasdad$
sdfsdf %%% 2 + 2 = 

Note that it differs from the answer given by @RomanPerekhrest. If you consider that ^, = or + should be included too, then you can use the following line:

perl6 -pe 's:g/<:punct-[-@_%]+[^+=]>+//' file

The output will be the same:

 @-$%
 @ sdfsd %
asdasd asd %%   --@_ _asdasdad$
sdfsdf %%% 2  2  
abitmol
  • 169