Simon Rygård

Calculating the size of a DynamoDB item

Background

According to the AWS Docs, DynamoDB is:

[…] a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability. You can use Amazon DynamoDB to create a database table that can store and retrieve any amount of data, and serve any level of request traffic. Amazon DynamoDB automatically spreads the data and traffic for the table over a sufficient number of servers to handle the request capacity specified by the customer and the amount of data stored, while maintaining consistent and fast performance.

From having worked with DynamoDB it is clear that it is, like many other databases, a powerful tool when used correctly. Surely, there are already a shit-ton of blog posts and articles on best practices and practical tips. Some of my learnings might be a candidate for a future post, but this one is about a very specific topic: how to calculate the size of a DynamoDB item.

There are already a couple tools that do this, I will list the ones I’ve seen:

Thanks to the authors/contributors of the above listed tools for the inspiration!

Problem & Solution

It’s simple. I’m mostly writing Go and would like there to be a way to calculate the size of a DynamoDB item before attempting to write the item.

⚠️ Coming back to best practices: If you have defined your schema in a way that your objects run the risk of approaching the size limit, you should probably consider a different approach. Anyway, this is about size calculation of items, not best practices.

The solution is to use the dynamodb/attributevalue package from the aws-sdk-g-v2 library. The MarshalMap function from this package will recursively marshal a Go struct into a DynamoDB attribute value map. It has the following signature:

func attributevalue.MarshalMap(in interface{}) (map[string]types.AttributeValue, error)

ℹ️ Alternatively, we can use the Marshal function to marshal a Go struct into a single types.AttributeValue. However, for Go structs this will always return a types.AttributeValueMemberM (map) at the top level which does not provide any additional value. Additionally, the dynamodb Client has the signature func (c *Client) PutItem(ctx context.Context, params *PutItemInput, optFns ...func(*Options)) (*PutItemOutput, error) which takes a PutItemInput struct. The PutItemInput’s required Item field of type map[string]types.AttributeValue, we might as well use MarshalMap to get the map[string]types.AttributeValue directly.

Now, we can recursively traverse the map[string]types.AttributeValue and calculate the size of each attribute+value pair. For every particular AttributeValue type, we reference the DynamoDB Item size and formats doc to calculate the size of the item. Below is some of the relevant information from the doc and some pseudo code for the calculation:

For the actual implementation, see the github.com/ryeguard/ddbcalc repo.

Details

When writing the tests in the repo, I realized that the Dynamo Set types, i.e., types.AttributeValueMemberSS, types.AttributeValueMemberNS, and types.AttributeValueMemberBS, will not be used by the MarshalMap function by default. This follows quite naturally because Go does not have a built-in set type (unless you are using a map as a kind of set).

In Dynamo, set type can represent multiple scalar values of type string, number, or binary. The properties of a set differ from those of a list according to Supported data types and naming rules in Amazon DynamoDB as follows:

In other words, if we want to utilize the set types while using Go, we will have to tag our struct field in question with the dynamodbav:",stringset", dynamodbav:",numberset", or dynamodbav:",binaryset" tag like so:

type Item struct {
  MyStringSetField []string `dynamodbav:",stringset"`
  MyNumberSetField []string `dynamodbav:",numberset"`
  MyBinarySetField []string `dynamodbav:"myField,binaryset"` // also changing the field name to "myField"
}

Since sets are not an inherent part of Go, it is arguably more natural to use lists instead. However, it is good to be aware of the possibility of using sets as it might work really well for some use cases.

Resources I found useful: