1

I want to delete a set of lines (globally) only if the entire pattern matches.

Pattern Description:

Line1:^[#]+ .*

Line2:^[[:space:]]*$

Line3:^-[[:space:]]*$

Line4:^[[:space:]]*$

Line5:^[#]+ .*$|^[-]+[[:space:]]*$

Note:

  1. Line3 can have space(s) after -
  2. Line2 and Line4 may have a space character or should be blank
  3. Line5, either matches ^[#]+ .*$ or ^[-]+[[:space:]]*$
  4. I don't want to delete the last line of the pattern i.e. Line5 in the pattern description.

Example:

# Body

- Inside Body

# Summary

-

# Bibliography

- Read this book

Expected output:

# Body

- Inside Body

# Bibliography

- Read this book

Note: The provided solution works, is it possible to write it more clearly as follows:

e = '(^|\n)[#]+ .*\
    \n[\t ]*\
    \n-[\t ]*\
    \n[\t ]*\
    \n([#]+ .*|[-]+[\t ]*)\n'

Also, how can we do the provided solution for multiple occurrences of the multiline pattern?

Porcupine
  • 1,892
  • Do you know the line terminator that will be present? Also would an answer using awk (or any other test processing tool) be acceptable? – goodguy5 Dec 18 '18 at 13:31
  • I would be happy if its portable to both Windows and uni, if not possible Unix would be preferable. Other scripting languages are also good like awk, python, javascript – Porcupine Dec 18 '18 at 13:33
  • Is this document in a known format? Does it have a parser? – Kusalananda Dec 19 '18 at 13:37
  • @Kusalananda Yes I use a custom format to take notes in markdown files (data files). I have created a script file to remove unnecessary elements of the format (that have not been used) in a temporary file (copy of data files), and then render it with Pandoc. – Porcupine Dec 19 '18 at 14:23

1 Answers1

2

A python solution, should work for python2 or 3. reads from stdin, outputs to stdout. About the only thing I did was change the expression for [[:space:]] to [\t ].

#!/usr/bin/python3

import sys
import re
e='(^|\n)[#]+ .*\n[\t ]*\n-[\t ]*\n[\t ]*\n([#]+ .*|[-]+[\t ]*)\n'
print(re.sub(e, '\\1\\2\n', sys.stdin.read()))
icarus
  • 17,920
  • Clarification: Can we do inplace substitution when, read from a file? – Porcupine Dec 18 '18 at 14:34
  • linux doesn't have any primitives to do this, as the file changes length. So you need to write to a temporary file and rename. See https://stackoverflow.com/questions/42429320/what-is-the-best-way-to-modify-a-text-file-in-place for examples of modules to do it. – icarus Dec 18 '18 at 14:44
  • Thanks, I did this and it works: python Code.py < ./Input.md > ./Output.md – Porcupine Dec 18 '18 at 15:04
  • Additionally, please see the Note at the end of the question. – Porcupine Dec 19 '18 at 13:05
  • I tried to do global replacement using this print(re.sub(e, '\1\2\n', sys.stdin.read(), flags=re.MULTILINE)) but it does only once. Could you please have a look? I am using Python3.6 – Porcupine Dec 19 '18 at 13:29
  • Could you please explain why do we need (^|\n) i.e. \n in the beginning? Without this, it doesn't work. – Porcupine Dec 19 '18 at 14:32
  • @Nikhil What makes you think that this solution doesn't already handle global replacement? Can you edit the question to include an example of input and output that this solution fails on? – icarus Dec 20 '18 at 06:30