# Python: Extract block of code from HTML document?

Solution for Python: Extract block of code from HTML document?
is Given Below:

I’m trying to extract blocks of code from an HTML document and save them to a Markdown file with Python. I’m stumped because the HTML document numbers each line of code and I need to remove them to make the code appear properly in the Markdown file. When I use `re.sub(d+n)`to remove line numbers it causes lines of code that end in a number to have deletions since they too also match the pattern. For example, `2n` just before the start of line 29 in my code in the string below gets deleted. That 2 character belongs to the end of line 28 of the code and it shouldn’t be deleted.

The relevant string from the HTML page is:

“1nclass Solution:n2n def findMedianSortedArrays(self, nums1: List[int], nums2: List[int]) -> float:n3n if len(nums1) > len(nums2):n4n nums1, nums2 = nums2, nums1n5n # Lengths of two arraysn6n m = len(nums1)n7n n = len(nums2)n8n # Pointers for binary searchn9n start = 0n10n end = mn11n # Binary search starts from heren12n while start <= end:n13n # Partition indices for both the arraysn14n partition_nums1 = (start + end) // 2n15n partition_nums2 = (m + n + 1) // 2 – partition_nums1n16n # Edge casesn17n # If there are no elements left on the left side after partitionn18n maxLeftNums1 = -sys.maxsize if partition_nums1 == 0 else nums1[partition_nums1 – 1]n19n # If there are no elements left on the right side after partitionn20n minRightNums1 = sys.maxsize if partition_nums1 == m else nums1[partition_nums1]n21n # Similarly for nums2n22n maxLeftNums2 = -sys.maxsize if partition_nums2 == 0 else nums2[partition_nums2 – 1]n23n minRightNums2 = sys.maxsize if partition_nums2 == n else nums2[partition_nums2]n24n # Check if we have found the matchn25n if maxLeftNums1 <= minRightNums2 and maxLeftNums2 <= minRightNums1:n26n # Check if the combined array is of even/odd lengthn27n if (m + n) % 2 == 0:n28n return (max(maxLeftNums1, maxLeftNums2) + min(minRightNums1, minRightNums2)) / 2n29n else:n30n return max(maxLeftNums1, maxLeftNums2)n31n # If we are too far on the right, we need to go to left siden32n elif maxLeftNums1 > minRightNums2:n33n end = partition_nums1 – 1n34n # If we are too far on the left, we need to go to right siden35n else:n36n start = partition_nums1 + 1”

Is there a better way to remove the line numbering or extract the code from the HTML document?

What you are looking to remove are numbers that appear exactly at the beginning of each line, so you could do the following:

``````exp = re.compile(r'^d+n', re.MULTILINE)
# obtain code...
code = exp.sub('', code)
``````

The `^` at the beginning of the regexp means “match the start of the string”. The `re.MULTILINE` flag makes it match the start of all lines in the string.