1

How to use sed to cut text starting with x spaces and ending with y spaces?

For instance This is my string:

 kkk 111 fff      aaabbb 5d98 ccc         mmmppp 9369d

and I want to get this output:

 aaabbb 5d98 ccc

(the number of spaces is not known)

Thank you.

pr.nizar
  • 194
  • 1
    Do you want to print every word in the file that has the same number of spaces on either side? – PM 2Ring Jan 13 '15 at 13:49
  • 1
    seems like an XYproblem... you actually want the title (with possibly several words), am i right? – Olivier Dulac Jan 13 '15 at 15:52
  • @Qlivier Dulac: Yes you're right the title can have several words I forgot to specify this. I'm editing my question so it reflects this. – pr.nizar Jan 13 '15 at 19:42
  • Could you add multiple examples of the actual lines you want to extract the title from? If you can, include the most difficult ones as well as several standard ones. That will make your actual requirements a lot clearer, which will make it easier to give you a useful answer. :-) Thanks in advance! – Esteis Jan 14 '15 at 07:06
  • Thank you. I've edited my question so it reflects the most accurate use case. String: kkk 111 fff aaabbb 5d98 ccc mmmppp 9369d, result: aaabbb 5d98 ccc – pr.nizar Jan 14 '15 at 16:11

5 Answers5

1

We told about some text staring with unknown quantity of spaces so

sed 's/.* \{2,\}\([[:alnum:]].*\) \{2,\}.*/\1/'

or with -r (-E)

sed -E 's/.* {2,}([[:alnum:]].*) {2,}.*/\1/'

seems to enough but grep is better in the case

grep -Po ' {2,}\K[[:alnum:]].*(?= {2,})'

And not so strong (just with two whitespaces) but correct too:

sed -E 's/.*  (\w.*)  .*/\1/'
Costas
  • 14,916
  • Thank you but I'm sorry no one worked for me.. I think because of the use case I gave.. Actually even the title contains spaces: 01 Lorem Ipsum Chapter 01 – pr.nizar Jan 13 '15 at 19:33
  • @pr.nizar This is because you use tabs instead spaces so sed -E 's/.*\t(.+)\t.*/\1/' will help you – Costas Jan 13 '15 at 20:52
  • No actually I used 6 spaces before and after the title but echo '01 Lorem Ipsum Chapter 01' | sed 's/.* \{2,\}\([[:alnum:] ]\{2,\}\) \{2,\}.*/\1/' gives nothing as output. Same thing for the 2nd and 4th. The 3rd gave me Lorem Ipsum Chapter. – pr.nizar Jan 13 '15 at 21:06
  • No, let's say alfanum of any length. The only delimeter are n spaces before and after the title. Let's say this string: kkk 111 fff aaabbb 5d98 ccc mmmppp 9369d; I want to get aaabbb 5d98 ccc as result. I'm sorry I'm a total noob at RegEx. – pr.nizar Jan 13 '15 at 21:27
  • Actually the amount of spaces is greater than one space but it's random n spaces. – pr.nizar Jan 13 '15 at 21:43
  • 1
    @pr.nizar Be sure that spaces 2 or more sed -E 's/.* {2,}(\w.+\w) {2,}.*/\1/' – Costas Jan 13 '15 at 22:12
  • Yep that is the correct answer! You did it! Thank you very much! Can you edit your answer so I tick it as the correct answer? – pr.nizar Jan 13 '15 at 23:31
  • 1
    @pr.nizar Edited. – Costas Jan 14 '15 at 09:05
0

You can use -r option for extended regular expression where number of characters can be specified inside {}, so the following will print all words surrounded by 6 spaces:

sed -r 's/.* {6}(\w*) {6}.*/\1/'

In case if the title has spaces too, the better choice would be

sed -r 's/.* {6}(.*) {6}.*/\1/'
jimmij
  • 47,140
  • Thank you but I'm sorry this didn't work for me.. I think because of the use case I gave.. Actually even the title contains spaces: 01 Lorem Ipsum Chapter 01 – pr.nizar Jan 13 '15 at 19:34
  • @pr.nizar try with sed -r 's/.* {6}(.*) {6}.*/\1/'. This must work, however it can match more instances than you may want to (for example it includes additional spaces in the pattern in case if more then 6 spaces surrounding the title). – jimmij Jan 13 '15 at 22:09
0

Edit: I've borrowed the -r flag (enables extended regex syntax) from jimmij to cure backlashitis.

The following works, under the following conditions:

  • you are willing to say that the field separator is at least n spaces, e.g. 3
  • the contents of the field of interest do not include a space anywhere.

In that case, this regex works:

    echo ' 01      Title      Chapter 01' |
    sed -r 's/^.* {3,}([^ ]+) {3,}.*$/\1/'

Or, in case you like your backslashes, this is what this looks like in non-extended regex syntax:

    echo ' 01      Title      Chapter 01' |
    sed 's/^.* \{3,\}\([^ ]\+\) \{3,\}.*$/\1/'

Explanation of the regex:

^        start of line
.*       any number of characters at the start of the line
 {3,}    at least 3 spaces
([^ ]+)  1 or more non-space characters (capture this group as \1)
 {3,}    at least 3 spaces
.*       anything on the rest of the line
$        end of the line. Not needed, because of the .*, but nicely explicit.
Esteis
  • 374
  • Thank you but I'm sorry this didn't work for me.. I think because of the use case I gave.. Actually even the title contains spaces: 01 Lorem Ipsum Chapter 01 ... – pr.nizar Jan 13 '15 at 19:38
0

Assuming you want the same number of spaces on either side:

$ sed -r 's/(^|.*[^[:space:]])([[:space:]]+)([^[:space:]]+)\2([^[:space:]].*|$)/\3/g' <<<"01      Title      Chapter 01"
Title

(I used the character class instead of just , with just a space, the expression should be considerably shorter: sed -r 's/(^|.*[^ ])( +)([^ ]+)\2([^ ].*|$)/\3/g').

By using the backreference within LHS, we ensure that the same number of spaces are present on both sides.

muru
  • 72,889
  • Thank you but I'm sorry it worked partially.. I think because of the use case I gave.. Actually even the title contains spaces: 01 Lorem Ipsum Chapter 01.. For this I get only Lorem.. – pr.nizar Jan 13 '15 at 19:34
0

I believe you are trying to catch the title ?

here us a way to catch things by getting rid of the first word, and the last 2 words, and displaying the rest (spaces included) :

awk '{ $1=""; $(NF-1)="" ; $NF="" ; print $0}'

Or even better: get rid of first element, and discard last 2, and also the extra spaces ( changing a $n, or NF, forces a redraw of $0 on most awk implementation):

awk '{ shift ; NF=(NF-2); print $0}'

example

$   echo   ' 01      Title is   here!     Chapter 01' | awk '{ shift ; NF=(NF-2); print $0}'

 Title is here!

The advantage of awk is that it is easy to add tests (is $1 an integer? is $(NF-1) "Chapter" ? etc)

  • Thank you but this outputs 01 Title is here! and not Title is here! – pr.nizar Jan 13 '15 at 19:40
  • @pr.nizar : then add a $2="" or, better, a shift. and now with the modified example, this solution is not valid anymore unless you know how many items to discard – Olivier Dulac Jan 14 '15 at 01:40