SED: How to delete from the second match to the end of the file

Solution for SED: How to delete from the second match to the end of the file
is Given Below:

I have the following file

Name            ITEM  Description
fa1(RM01):      1     word1
                2     word2
                3     word3

fa2(RM02):      1     word1
                2     word2
                3     word3

I need to remove from the second match “(RM0”, including that line and everything below.
I need to delete this way because there are several files where “fa1 (RM01)” and “fa2 (RM02)” change places.

And I can’t delete by number of lines either because the number of items can change.

If you have text in blocks separated by a blank line delimiter, you use this simple awk to print the first block:

$ awk -v RS= 'FNR==1' file
Name            ITEM  Description
fa1(RM01):      1     word1
                2     word2
                3     word3

To take your post literally you want to print the first block with RM0 in it. Same method:

$ awk -v RS= '/RM0/ && cnt++<1' file

This might work for you (GNU sed):

sed '/RM0/{x;/./Q;x;h}' file

On a match of RM0 set a flag in the hold space. On a subsequent match quit processing.

N.B. The hold space is empty at start of processing and the Q quits without printing the current line.

I need to remove from the second match “(RM0”, including that line and everything below

You can do this:

sed ':a; /(RM0/! { n; b a; }; :b; n; /(RM0/ Q; b b' < input

Explanation:

  • :a – label ‘a’
  • /(RM0/! { n; b a; } – for a line that does not match the (basic) regular expression (RM0, perform the following group of commands
    • n – (print the current line and) read the next line
    • b a – branch to label ‘a’
  • :b – label ‘b’
  • n – (print the current line and) read the next line
  • /(RM0/ Q – if the current line matches the (basic) regular expression (RM0 then stop processing input and exit without printing the current line
  • b b – branch to label ‘b’

That’s a bit unusual for a sed script in that it processes the whole input in a single sed cycle. It reads and output lines until it sees the first instance of the pattern, then it outputs and reads lines until it reaches the second instance of the pattern, at which point it terminates.

The Q is specific to GNU sed. If you don’t want to depend on that then you can use this variation:

sed -n ':a; /(RM0/! { p; n; b a; }; :b; p; n; /(RM0/ q; b b' < input

That’s functionally almost the same, but the -n option suppresses auto-printing the current line when the q command is executed. That also turns off auto-printing by the n command, and that is compensated by adding an explicit p (print) command before each n.

If you want to modify the file itself, this is easy with ed:

printf "%sn" '/(RM0/;//,$ d' w | ed -s file.txt

This first sets the current line to the first one matching the regular expression (RM0, then deletes everything from the next line that matches the same RE to the end of the file, and writes the modified file back to disk.

To print out the first part of the file instead of editing it, if this is just part of a longer pipeline:

printf "%sn" '/(RM0/;//,$ d' ',p' Q | ed -s file.txt

or

printf "%sn" "/(RM0/;1,//-2 p" | ed -s input.txt

If awk works, you can delete all lines from your target match and below easily.

cat samp.txt
Name            ITEM  Description
fa1(RM01):      1     word1
                2     word2
                3     word3

fa2(RM02):      1     word1
                2     word2
                3     word3
                4     looo4
                5     Loops
                6     Loop5

fa4(RM04):      1     word1
                2     word2
                3     word3

fa3(RM03):      1     word1
                2     word2
                3     word3

fa5(RM02):      1     word1
                2     word2
                3     word3
                4     looo4
                5     Loops
                6     Loop5

In this case, we want to match RM02 and all lines below it. We can do so with the following awk code.

awk -v RS="nn" -v ORS="nn" '!/RM02/' $file

Due to the blank lines, it makes it easy to seperate the records by assigning the nn regex or providing a empty string "" to the RS builtin.

awk -v RS="" -v ORS="nn" '!/RM02/' samp.txt
Name            ITEM  Description
fa1(RM01):      1     word1
                2     word2
                3     word3

fa4(RM04):      1     word1
                2     word2
                3     word3

fa3(RM03):      1     word1
                2     word2
                3     word3

We can achieve similar with sed

sed '/(RM02)/{N;N;//d;}' will once again match RM02 but will only delete the following 2 lines. If you know the exact number of lines to be removed, this ‘could’ be useful.

sed '/(RM02)/{N;N;N;N;//d;}' samp.txt
Name            ITEM  Description
fa1(RM01):      1     word1
                2     word2
                3     word3

                6     Loop5

fa4(RM04):      1     word1
                2     word2
                3     word3

fa3(RM03):      1     word1
                2     word2
                3     word3

                6     Loop5

Because only 4 N; new lines was selected, it matched all RM02 and deleted that line as well as the 4 leaving 6 Loop5. Adding an additional N; to the code will work but can be tedious.

If this was your input:

$ cat file
Name            ITEM  Description
fa1(RM01):      1     word1
                2     word2
                3     word3

fa2(RM02):      1     word1
                2     word2
                3     word3

fa3(RM03):      1     word1
                2     word2
                3     word3

From I need to remove from the second match "(RM0", including that line and everything below. it sounds like you want one or the other of these, but it’s not clear which from your question:

$ awk -v RS= -v ORS='nn' '!(/(RM0/ && (++c == 2))' file
Name            ITEM  Description
fa1(RM01):      1     word1
                2     word2
                3     word3

fa3(RM03):      1     word1
                2     word2
                3     word3

$ awk -v RS= -v ORS='nn' '!(/(RM0/ && (++c >= 2))' file
Name            ITEM  Description
fa1(RM01):      1     word1
                2     word2
                3     word3

but it’s also possible that all you really need is one or the other of these:

$ awk -v RS= -v ORS='nn' 'NR!=2' file
Name            ITEM  Description
fa1(RM01):      1     word1
                2     word2
                3     word3

fa3(RM03):      1     word1
                2     word2
                3     word3

$ awk -v RS= -v ORS='nn' 'NR==1' file
Name            ITEM  Description
fa1(RM01):      1     word1
                2     word2
                3     word3

Or something else. Lots of possibilities…