According to the AWS Docs, DynamoDB is:
[…] a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability. You can use Amazon DynamoDB to create a database table that can store and retrieve any amount of data, and serve any level of request traffic. Amazon DynamoDB automatically spreads the data and traffic for the table over a sufficient number of servers to handle the request capacity specified by the customer and the amount of data stored, while maintaining consistent and fast performance.
From having worked with DynamoDB it is clear that it is, like many other databases, a powerful tool when used correctly. Surely, there are already a shit-ton of blog posts and articles on best practices and practical tips. Some of my learnings might be a candidate for a future post, but this one is about a very specific topic: how to calculate the size of a DynamoDB item.
There are already a couple tools that do this, I will list the ones I’ve seen:
Thanks to the authors/contributors of the above listed tools for the inspiration!
It’s simple. I’m mostly writing Go and would like there to be a way to calculate the size of a DynamoDB item before attempting to write the item.
⚠️ Coming back to best practices: If you have defined your schema in a way that your objects run the risk of approaching the size limit, you should probably consider a different approach. Anyway, this is about size calculation of items, not best practices.
The solution is to use the dynamodb/attributevalue
package from the aws-sdk-g-v2
library. The MarshalMap
function from this package will recursively marshal a Go struct into a DynamoDB attribute value map. It has the following signature:
func attributevalue.MarshalMap(in interface{}) (map[string]types.AttributeValue, error)
ℹ️ Alternatively, we can use the
Marshal
function to marshal a Go struct into a singletypes.AttributeValue
. However, for Go structs this will always return atypes.AttributeValueMemberM
(map) at the top level which does not provide any additional value. Additionally, thedynamodb
Client has the signaturefunc (c *Client) PutItem(ctx context.Context, params *PutItemInput, optFns ...func(*Options)) (*PutItemOutput, error)
which takes aPutItemInput
struct. ThePutItemInput
’s requiredItem
field of typemap[string]types.AttributeValue
, we might as well useMarshalMap
to get themap[string]types.AttributeValue
directly.
Now, we can recursively traverse the map[string]types.AttributeValue
and calculate the size of each attribute+value pair. For every particular AttributeValue
type, we reference the DynamoDB Item size and formats doc to calculate the size of the item. Below is some of the relevant information from the doc and some pseudo code for the calculation:
Strings (types.AttributeValueMemberS
):
- “Strings are Unicode with UTF-8 binary encoding. The size of a string is (number of UTF-8-encoded bytes of attribute name) + (number of UTF-8-encoded bytes).”
Strings in Go are UTF-8 encoded, so we can use len
to get the number of bytes.
size += len(*av.Value)
Numbers (types.AttributeValueMemberN
):
- “Numbers are variable length, with up to 38 significant digits. Leading and trailing zeroes are trimmed. The size of a number is approximately (number of UTF-8-encoded bytes of attribute name) + (1 byte per two significant digits) + (1 byte).”
We can use the len
function to get actual number of bytes of the string representation of the number.
size += len(*av.Value)
Binary (types.AttributeValueMemberB
):
- “A binary value must be encoded in base64 format before it can be sent to DynamoDB, but the value’s raw byte length is used for calculating size. The size of a binary attribute is (number of UTF-8-encoded bytes of attribute name) + (number of raw bytes).”
We can use len
to get the number of bytes of the encoded binary value.
size += len(av.Value)
Boolean (types.AttributeValueMemberBOOL
) and Null (types.AttributeValueMemberNULL
):
- “The size of a null attribute or a Boolean attribute is (number of UTF-8-encoded bytes of attribute name) + (1 byte).”
size += 1
Lists (types.AttributeValueMemberL
) and Maps (types.AttributeValueMemberM
):
- “An attribute of type List or Map requires 3 bytes of overhead, regardless of its contents. The size of a List or Map is (number of UTF-8-encoded bytes of attribute name) + sum (size of nested elements) + (3 bytes). The size of an empty List or Map is (number of UTF-8-encoded bytes of attribute name) + (3 bytes).” >
- “Each List or Map element also requires 1 byte of overhead.”
// list
size += 3
for _, v := range av.Value {
size += calculateSize(v)
size++
}
// map
size += 3
for k, v := range av.Value {
size += len(k)
size += calculateSize(v)
size++
}
For the actual implementation, see the github.com/ryeguard/ddbcalc repo.
When writing the tests in the repo, I realized that the Dynamo Set types, i.e., types.AttributeValueMemberSS
, types.AttributeValueMemberNS
, and types.AttributeValueMemberBS
, will not be used by the MarshalMap
function by default. This follows quite naturally because Go does not have a built-in set type (unless you are using a map
as a kind of set).
In Dynamo, set type can represent multiple scalar values of type string, number, or binary. The properties of a set differ from those of a list according to Supported data types and naming rules in Amazon DynamoDB as follows:
In other words, if we want to utilize the set types while using Go, we will have to tag our struct field in question with the dynamodbav:",stringset"
, dynamodbav:",numberset"
, or dynamodbav:",binaryset"
tag like so:
type Item struct {
MyStringSetField []string `dynamodbav:",stringset"`
MyNumberSetField []string `dynamodbav:",numberset"`
MyBinarySetField []string `dynamodbav:"myField,binaryset"` // also changing the field name to "myField"
}
Since sets are not an inherent part of Go, it is arguably more natural to use lists instead. However, it is good to be aware of the possibility of using sets as it might work really well for some use cases.
Resources I found useful: