Solution for Why are my async requests slower than sync ones?
is Given Below:
I need to make a 100 get requests to make a 100 BeautifulSoup objects from different pages.
To practice my async skills I’ve written two functions, each of which makes 100 get-responses and creates 100 BeautifulSoup objects from the same page. I also need to use
sleep because I’m working with
imdb.com and they don’t like too many get responses:
# Gets a BeautifulSoup from a url asynchronously async def get_page_soup(url): response_text = await get_response_text(url) return BeautifulSoup(response_text, features="html.parser") async def get_n_soups_async(url, num_soups=100): soup = await get_page_soup(url) for i in range(num_soups - 1): soup = await get_page_soup(url) await asyncio.sleep(0.5) return soup
def get_n_soups_sync(url, num_soups=100): soup = BeautifulSoup(requests.get(url).text, features="html.parser") for i in range(num_soups - 1): soup = BeautifulSoup(requests.get(url).text, features="html.parser") time.sleep(0.5) return soup
async def main(): print("Async main() has started... ") t1 = time.perf_counter() soup = await get_n_soups_async('https://www.imdb.com/name/nm0425005', 100) t2 = time.perf_counter() print(t2 - t1, type(soup)) t1 = time.perf_counter() soup = get_n_soups_sync('https://www.imdb.com/name/nm0425005', 100) t2 = time.perf_counter() print(t2 - t1, type(soup)) print("Async main() is over.") loop = asyncio.get_event_loop() loop.run_until_complete(main())
What I can’t understand is why it takes my async function around
270 secs to run, while my sync one needs only around
What am I doing wrong in using async and how can I fix that to speed up getting 100 soups?
In my opinion this could be caused in the loop. In the async case you wait each time for the response. In the sync one you will continue without waiting for the response. So you will call the next soup before the last one arrived.
You could create an own promise so that u will do the sync call in it. This promise you can also await so that you await the whole result instead of every single one.