sed regex on multiple line can’t capture all

Solution for sed regex on multiple line can’t capture all
is Given Below:

i have this text file (example)

<This is a line of text with a year=2020 month=12 in it
This line of text does not have a year or month in it
This year=2021 is the current year the current month=1
This is the year=2021 the month=2/>


<This is a line of text with a year=33020 month=12 in it
This line of text does not have a year or month in it
This year=33020 is the current year the current month=1
This is the year=33020 the month=2/>

using linux sed ( sed (GNU sed) 4.2.2) regexp:

 sed -En 'N;s/<(This.*2020.*[sSn]*?)>/1/gp' test2.txt

It capture only this string :

<This is a line of text with a year=2020 month=12 in it
This line of text does not have a year or month in it

And i try to capture the first paragraph between < ….. > as group

what am i doing wrong here ?

If you want to print paragraphs (delimited by <...>) starting with <This, containing 2020, and only them, you could try:

sed -En '/^</!d;:a;/>$/!{N;ba;};/<This.*2020/p;' test2.txt

As long as the pattern space does not start with <, it is deleted and a new cycle is started (/^</!d).

Then, as long as the pattern space does not end with >, new lines are appended to it, but a new cycle is not started, instead we branch to the a label (/>$/!{N;ba;}).

When a full paragraph is stored in the pattern space we exit this loop and apply the last command (^<This.*2020/p): if the pattern space matches your pattern, it is printed. Finally, a new cycle starts.

Of course, the regular expressions must be adapted to your needs. If paragraph delimiters can be preceded (followed) by spaces, for instance, use:

sed -En '/^[[:space:]]*</!d;:a;/>[[:space:]]*$/!{N;ba;};/<This.*2020/p;' test2.txt

With GNU Awk, you can specify RS to be a regular expression.

bash gawk -v RS='[<>]' /This.*2020/ <<:
> <This is a line of text with a year=2020 month=12 in it This line of
> text does not have a year or month in it This year=2021 is the current
> year the current month=1 This is the year=2021 the month=2/>
> 
> <This is a line of text with a year=33020 month=12 in it This line of
> text does not have a year or month in it This year=33020 is the
> current year the current month=1 This is the year=33020 the month=2/>
> :
This is a line of text with a year=2020 month=12 in it
This line of text does not have a year or month in it
This year=2021 is the current year the current month=1
This is the year=2021 the month=2/

As you can see, this also trims the delimiter; but adding it back is not too hard (hint: { print "<" $0 ">" }).