Solution for SED: How to delete from the second match to the end of the file
is Given Below:
I have the following file
Name ITEM Description
fa1(RM01): 1 word1
2 word2
3 word3
fa2(RM02): 1 word1
2 word2
3 word3
I need to remove from the second match “(RM0”, including that line and everything below.
I need to delete this way because there are several files where “fa1 (RM01)” and “fa2 (RM02)” change places.
And I can’t delete by number of lines either because the number of items can change.
If you have text in blocks separated by a blank line delimiter, you use this simple awk to print the first block:
$ awk -v RS= 'FNR==1' file
Name ITEM Description
fa1(RM01): 1 word1
2 word2
3 word3
To take your post literally you want to print the first block with RM0
in it. Same method:
$ awk -v RS= '/RM0/ && cnt++<1' file
This might work for you (GNU sed):
sed '/RM0/{x;/./Q;x;h}' file
On a match of RM0
set a flag in the hold space. On a subsequent match quit processing.
N.B. The hold space is empty at start of processing and the Q
quits without printing the current line.
I need to remove from the second match “(RM0”, including that line and everything below
You can do this:
sed ':a; /(RM0/! { n; b a; }; :b; n; /(RM0/ Q; b b' < input
Explanation:
:a
– label ‘a’/(RM0/! { n; b a; }
– for a line that does not match the (basic) regular expression(RM0
, perform the following group of commandsn
– (print the current line and) read the next lineb a
– branch to label ‘a’
:b
– label ‘b’n
– (print the current line and) read the next line/(RM0/ Q
– if the current line matches the (basic) regular expression(RM0
then stop processing input and exit without printing the current lineb b
– branch to label ‘b’
That’s a bit unusual for a sed script in that it processes the whole input in a single sed cycle. It reads and output lines until it sees the first instance of the pattern, then it outputs and reads lines until it reaches the second instance of the pattern, at which point it terminates.
The Q
is specific to GNU sed
. If you don’t want to depend on that then you can use this variation:
sed -n ':a; /(RM0/! { p; n; b a; }; :b; p; n; /(RM0/ q; b b' < input
That’s functionally almost the same, but the -n
option suppresses auto-printing the current line when the q
command is executed. That also turns off auto-printing by the n
command, and that is compensated by adding an explicit p
(print) command before each n
.
If you want to modify the file itself, this is easy with ed
:
printf "%sn" '/(RM0/;//,$ d' w | ed -s file.txt
This first sets the current line to the first one matching the regular expression (RM0
, then deletes everything from the next line that matches the same RE to the end of the file, and writes the modified file back to disk.
To print out the first part of the file instead of editing it, if this is just part of a longer pipeline:
printf "%sn" '/(RM0/;//,$ d' ',p' Q | ed -s file.txt
or
printf "%sn" "/(RM0/;1,//-2 p" | ed -s input.txt
If awk
works, you can delete all lines from your target match and below easily.
cat samp.txt
Name ITEM Description
fa1(RM01): 1 word1
2 word2
3 word3
fa2(RM02): 1 word1
2 word2
3 word3
4 looo4
5 Loops
6 Loop5
fa4(RM04): 1 word1
2 word2
3 word3
fa3(RM03): 1 word1
2 word2
3 word3
fa5(RM02): 1 word1
2 word2
3 word3
4 looo4
5 Loops
6 Loop5
In this case, we want to match RM02 and all lines below it. We can do so with the following awk
code.
awk -v RS="nn" -v ORS="nn" '!/RM02/' $file
Due to the blank lines, it makes it easy to seperate the records by assigning the nn
regex or providing a empty string ""
to the RS
builtin.
awk -v RS="" -v ORS="nn" '!/RM02/' samp.txt
Name ITEM Description
fa1(RM01): 1 word1
2 word2
3 word3
fa4(RM04): 1 word1
2 word2
3 word3
fa3(RM03): 1 word1
2 word2
3 word3
We can achieve similar with sed
sed '/(RM02)/{N;N;//d;}'
will once again match RM02 but will only delete the following 2 lines. If you know the exact number of lines to be removed, this ‘could’ be useful.
sed '/(RM02)/{N;N;N;N;//d;}' samp.txt
Name ITEM Description
fa1(RM01): 1 word1
2 word2
3 word3
6 Loop5
fa4(RM04): 1 word1
2 word2
3 word3
fa3(RM03): 1 word1
2 word2
3 word3
6 Loop5
Because only 4 N;
new lines was selected, it matched all RM02
and deleted that line as well as the 4 leaving 6 Loop5
. Adding an additional N;
to the code will work but can be tedious.
If this was your input:
$ cat file
Name ITEM Description
fa1(RM01): 1 word1
2 word2
3 word3
fa2(RM02): 1 word1
2 word2
3 word3
fa3(RM03): 1 word1
2 word2
3 word3
From I need to remove from the second match "(RM0", including that line and everything below.
it sounds like you want one or the other of these, but it’s not clear which from your question:
$ awk -v RS= -v ORS='nn' '!(/(RM0/ && (++c == 2))' file
Name ITEM Description
fa1(RM01): 1 word1
2 word2
3 word3
fa3(RM03): 1 word1
2 word2
3 word3
$ awk -v RS= -v ORS='nn' '!(/(RM0/ && (++c >= 2))' file
Name ITEM Description
fa1(RM01): 1 word1
2 word2
3 word3
but it’s also possible that all you really need is one or the other of these:
$ awk -v RS= -v ORS='nn' 'NR!=2' file
Name ITEM Description
fa1(RM01): 1 word1
2 word2
3 word3
fa3(RM03): 1 word1
2 word2
3 word3
$ awk -v RS= -v ORS='nn' 'NR==1' file
Name ITEM Description
fa1(RM01): 1 word1
2 word2
3 word3
Or something else. Lots of possibilities…