2

The block always begins with: 00 PROGRAM

and it ends with XYZ followed by a blank line. The XYZ appears earlier too, in the block of lines, but the earlier lines are followed by more data. I just need that last line before the blank line. I have not found any code that seems to match my intent. I hope this will be an easy answer for someone!

I do want to keep the last line before the blank line. For example:

000-12-22
AB1
00   PROGRAM
01   INQUIRY
03   XYZ
04   XYZ
blank line
LINE VALUE
00456

Only this part should be deleted.

00   PROGRAM
01   INQUIRY
03   XYZ
985ranch
  • 151
  • it is very much the same except this time I need to keep the last line of the block before the blank line. – 985ranch Aug 16 '16 at 23:07
  • Regarding title: I don't just want to extract the one line though. I want to delete the rest of the lines before it. Yes .. the blank line is always preceded by XYZ and the "range" or block always begins with 00 PROGRAM. – 985ranch Aug 16 '16 at 23:25
  • "I don't just want to extract the one line though. I want to delete the rest of the lines before it." Please clarify: does "extract" mean keep or remove? – John1024 Aug 16 '16 at 23:27

2 Answers2

3

This type of range is a perfect use case for ex. I've written about ex rather a lot on this site; it is easily the best POSIX tool for scripted file edits.

Command:

If there is only one block you need to handle, use:

printf '%s\n' '/00 PROGRAM/' '.,/^$/-2d' x | ex file.txt

If there are potentially multiple blocks, use:

printf '%s\n' 'g/00 PROGRAM/.,/^$/-2d' x | ex file.txt

For testing, use %p instead of x:

printf '%s\n' '/00 PROGRAM/' '.,/^$/-2d' %p | ex file.txt
printf '%s\n' 'g/00 PROGRAM/.,/^$/-2d' %p | ex file.txt

This will print the whole buffer, rather than saving the contents of the buffer back to the file.


Illustration:

[vagrant@localhost ~]$ cat file.txt 
000-12-22
AB1
00 PROGRAM
01 INQUIRY
03 XYZ
04 XYZ

LINE VALUE 00456 [vagrant@localhost ~]$ printf '%s\n' '/00 PROGRAM/' '.,/^$/-2d' x | ex file.txt [vagrant@localhost ~]$ cat file.txt 000-12-22 AB1 04 XYZ

LINE VALUE 00456 [vagrant@localhost ~]$


Explanation and comments:

You could use ex -c 'editingcommands' filename but I have found that creates more problems than it solves: If an error is encountered ex won't quit but will hang waiting for user input. Additionally there are potential portability issues with passing multiple commands to ex this way as the common features that allow you to do so are not guaranteed by POSIX.

Instead, I usually pipe commands to ex from printf. This allows for easy newline separation of multiple commands by using %s\n as the format string to printf, and it leaves the file unchanged if there is an error, without hanging (e.g. if you try to edit a line greater than the last line of the file).

To test a command before actually editing the file, I use %p (print whole buffer) as the last command. Then I can tweak the command slightly and run it again and again, until I get the exact file contents I want. Once I am happy with the result, I change the %p to x and run the command one more time to actually save the changes to the file.

Here again is the command I gave as an answer to this question:

printf '%s\n' '/00 PROGRAM/' '.,/^$/-2d' x | ex file.txt

The printf command simply prints the three strings /00 PROGRAM/, .,/^$/-2d and x separated by newlines, like so:

[vagrant@localhost ~]$ printf '%s\n' '/00 PROGRAM/' '.,/^$/-2d' x
/00 PROGRAM/
.,/^$/-2d
x
[vagrant@localhost ~]$ 

These three lines are ex commands.

An overview of ex commands

An ex command has two parts: an address (line based), and a command.

If there is only an address, the cursor will move to that address (move to that line).

If there is only a command, the current line is used as the address.

An address can often be a range—an address, followed by a comma, followed by another address. This refers to all the lines from the first address to the second address.

An address can be a line number, but it doesn't have to be. It can also be a search pattern meaning, "The next line after the current line which matches this regex." You can do backward searches as well as forward searches.

You can even write an address that means, "Two lines before the instance of foo that occurs soonest after the instance of bar that most immediately precedes the current line." This would look like: ?bar?/foo/-2

Step by step

The command /00 PROGRAM/ is just an address, so it means "move the cursor to the first instance of the pattern '00 PROGRAM'."

The command .,/^$/-2d has two parts. The d at the end is the command, meaning "delete." The rest is the address.

The initial . is a special address that refers to the current line.

The pattern /^$/ is a regular expression for an empty line (start of line ^ immediately followed by end of line $). In this case it means the next empty line after the current cursor position.

The -2 means "two lines back."

All together, then, .,/^$/-2d means: "Delete the lines from the current line to the line two lines above the next empty line."

x simply means, save the buffer contents to file and exit the editor.


I hope you find this useful. ex is an extremely powerful tool for text editing. It's the immediate predecessor of vi, which is the "visual editor." All ex commands can be run in vi as well.

Wildcard
  • 36,499
  • Thanks Wildcard. This is a little outside my comfort zone, but I will read more about it try to catch up to you. – 985ranch Sep 07 '16 at 22:37
2

Answer for revised question

Let's consider this test file;

$ cat File2
000-12-22
AB1
00 PROGRAM
01 INQUIRY
03 XYZ
04 XYZ

LINE VALUE
00456

Try this command:

$ sed '/00 PROGRAM/,/^$/{/./{h;d}; x; p; x;}' File2
000-12-22
AB1
04 XYZ

LINE VA

LUE 00456

Answer for original question

If I have understood correctly, you have a file that contains groups of liens that begin with a line containing 00 PROGRAM and end with a blank line and you want the line before the blank line if it contains XYZ. If that is the case, try this:

sed -n '/00 PROGRAM/,/^$/{/./{h;d}; x;/XYZ/p}' file 

Example

Consider this sample file:

$ cat file
00 PROGRAM
some
XYZ discard this
data
XYZ keep this

other
00 PROGRAM
more
XYZ keep this also

end

This keeps just the XYZ lines that precede the blank line in a 00 PROGRAM block:

$ sed -n '/00 PROGRAM/,/^$/{/./{h;d}; x;/XYZ/p}' file 
XYZ keep this
XYZ keep this also

Alternative

Maybe, you want to keep all lines outside of the group and also keep the last non-empty line of the group if it matches XYZ. In that case:

$ sed '/00 PROGRAM/,/^$/{/./{h;d}; x;/XYZ/!d}' file 
XYZ keep this
other
XYZ keep this also
end
John1024
  • 74,655
  • I do want to keep the last line before the blank line, and your answer did perform that part really well, but it also wipes out everything else in the file. For example:

    000-12-22

    AB1

    00 PROGRAM

    01 INQUIRY

    03 XYZ

    04 XYZ

    LINE VALUE

    00456

    Only this part should be deleted. 00 PROGRAM

    01 INQUIRY

    03 XYZ

    FYI ... Don I promise to read the posting instructions on my next question! :)

    – 985ranch Aug 16 '16 at 23:43
  • 1
    @user185029 (a) In that case, does the solution shown under Alternative do what you want? (b) Since comments do not display new lines, it is not clear to me what your example above means. – John1024 Aug 16 '16 at 23:47
  • Your answer did perform keeping the last line really well, but it also wipes out everything else in the file. I edited my question to provide an example per Don's comment. I hope it helps. Thank you for all the assistance so far! – 985ranch Aug 16 '16 at 23:53
  • @user185029 please update your question to explain what you really want. It's full of ambiguities and confusion. Formatting would help, too. – Chris Davies Aug 16 '16 at 23:54
  • @user185029 See updated answer for revised question. – John1024 Aug 17 '16 at 00:01
  • 1
    Actually the alternative John posted worked. I will make sure to give better examples in my initial question next time. Thank you guys! – 985ranch Aug 17 '16 at 00:02