What is the best performance I can get by querying DynamoDB for a maximum 1MB?

Solution for What is the best performance I can get by querying DynamoDB for a maximum 1MB?
is Given Below:

I am using DynamoDB for storing data. And I see 1MB is the hard limit for a query to return. I have a case that queries a table to fetch 1MB of data in one partition. I’d like to know what the best performance I can get.

Based on DynamoDB doc, one partition can have a maximum of 3000 RCU. If I send an eventual consistency read, it should support responding 3000 * 8KB = 24000KB = 23MB per second.

If I send one query request to fetch 1MB from one partition, does this mean it should respond 1/23 second = 43 milliseconds?

I am testing in a lambda sends a query to DynamoDB with XRay enabled. It shows me the query takes 300ms more based on XRay trace. So I don’t understand why may cause the long latency.

What can I do if I want to reduce the latency to a single-digit millisecond? I don’t want to split the partition since 1MB is not really big size.

DynamoDB really is capable of single-digit millisecond latency, but if the item size is small enough to fit into 1 RCU. Reading 1 MB of data from a database in <10ms is a challenging task itself.

Here is what you can try:

  1. Split your read operation into two.

One will query with ScanIndexForward: true + Limit: N/2 and another will query with ScanIndexForward: false + Limit: N/2. The idea is to query the same data from both ends to the middle.

Do this in parallel and then you merge two responses into one.
However, this is likely will decrease latency from 300 to 150ms, which is still not <10ms.

  1. Use DAX – DynamoDB Caching Layer
  2. If your 1 MB of data is spread across thousands of items, consider using fewer items but each item will hold more data inside itself.
  3. Consider using a compression algorithm like brotli to compress the data you store in 1 DynamoDB item. Once I had success with this approach. Depending on the format, it can easily reduce your data size by 4x, which translates into ~4x faster query time! Which could be 8x faster with the approach described in item #1.

Also, beware, that constantly reading 1 MB of data from a database will incur huge costs.