Migrate Your Amazon DynamoDB Table to Oracle NoSQL Database Cloud Service

Migrate your DynamoDB Tables to OCI NoSQl Cloud Database using the OCI NoSQL Database Migration Tool

Shadab Mohammad
Oracle Developers

--

Oracle NoSQL Cloud Service (NDCS) is a Key-Value Database like Amazon DynamoDB; it is a Schema-less, Schema-Flex & horizontally scalable. It requires no provisioning of servers or instance types, you simply create a table and you’re good to read-write to it with single-digit millisecond latency. Oracle NDCS supports both on-demand and provisioned capacity with storage based provisioning that supports JSON, Table and Key-Value datatypes.

To help our customers migrate from other NoSQL databases, Oracle offers a NoSQL Database Migrator Utility that can help you migrate your DynamoDB tables to OCI NDCS using a S3 bucket.

In this blog post we will go through the steps on how to migrate your DynamoDB table to OCI NoSQL Cloud Database using the migrator utility and a bit of Python3 code.

High-level Steps to Migrate DynamoDB Table to OCI NDCS

1.Download and Unzip NoSQL Migrator Tool

2.Create OCI and AWS credentials by installing awscli and ocicli

3.Create config.json with Source and Sink Information

4.Run the Migrator Utility

5.Check OCI NoSQL Table After Import

1 . AWS DynamoDB Table Setup

For this demonstration we will use a sample table called PatientHealthRecords in DynamoDB. PatientHealthRecords has a Partition key on the PatientID and Sort key as the RecordDate

In DynamoDB, a table is a collection of items, and each item is a collection of attributes. Each item in the table has a unique identifier, or a primary key. Other than the primary key, the table is schema-less. Each item can have its own distinct attributes.

Partition key — A simple primary key, composed of one attribute known as the partition key. DynamoDB uses the partition key’s value as input to an internal hash function. The output from the hash function determines the partition in which the item will be stored.

Partition key and sort key — As a composite primary key, this type of key is composed of two attributes. The first attribute is the partition key, and the second attribute is the sort key. DynamoDB uses the partition key value as input to an internal hash function. The output from the hash function determines the partition in which the item will be stored. All items with the same partition key value are stored together, in sorted order by sort key value.

In contrast, Oracle NoSQL tables support flexible data models with both schema and schema-less design.

Let’s create the DynamoDB table with aws-cli

aws dynamodb create-table \
--table-name PatientHealthRecords \
--attribute-definitions \
AttributeName=PatientID,AttributeType=S \
AttributeName=RecordDate,AttributeType=S \
--key-schema \
AttributeName=PatientID,KeyType=HASH \
AttributeName=RecordDate,KeyType=RANGE \
--billing-mode PAY_PER_REQUEST

Let’s seed some records to the DynamoDB table using a Python3 script

import boto3
import faker
import sys

def generate_data(size):
fake = faker.Faker()
records = []
for _ in range(size):
record = {
'PatientID': fake.uuid4(),
'RecordDate': fake.date(),
'PatientName': fake.name(),
'Age': fake.random_int(min=0, max=100),
'Gender': fake.random_element(elements=('Male', 'Female', 'Other')),
'Diagnosis': fake.sentence(),
'Treatment': fake.sentence(),
'DoctorID': fake.uuid4()
}
records.append(record)
return records

def write_data_in_chunks(table_name, data, chunk_size):
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(table_name)
for i in range(0, len(data), chunk_size):
with table.batch_writer() as batch:
for record in data[i:i+chunk_size]:
batch.put_item(Item=record)
print(f"Successfully wrote {len(data)} records to {table_name} in chunks of {chunk_size}.")

if __name__ == "__main__":
table_name = 'PatientHealthRecords'
chunk_size = int(sys.argv[1]) if len(sys.argv) > 1 else 1000
data = generate_data(chunk_size)
write_data_in_chunks(table_name, data, chunk_size)

Run the script to seed the table

 python3 load_dynamodb_table.py 1000

Check the records from OCI Console

2. Setup the OCI NoSQL Database Migrator Tool on Oracle Linux VM on OCI

a. Download the utility from here : https://www.oracle.com/database/technologies/nosql-database-server-downloads.html

b. Upload it to an Oracle Linux 7 or Oracle Linux 8 VM and unzip it

cd $HOME

unzip nosql-migrator-1.6.0.zip

c. Install awscli and create your AWS credentials after installing awscli

https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html#cli-configure-files-methods

$ sudo yum install awscli

$ aws configure

d. Install ocicli and add your API key to the instance or create instance principal authentication

$ sudo yum install python36-oci-cli

$ oci setup config

Adding your API key to the Configuration File

[DEFAULT]
user=ocid1.user.oc1..<unique_ID>
fingerprint=<your_fingerprint>
key_file=~/.oci/oci_api_key.pem
tenancy=ocid1.tenancy.oc1..<unique_ID>
# Some comment
region=us-ashburn-1

3. Export DynamoDB Table to S3 Bucket

a. Create a S3 bucket patienthealthrecords-dynamodb-backup

b. Enable Point-In-Time-Recovery for your DynamoDB table

c. Export Table to S3

After the export is complected you will see below structure in your bucket

4. Create config File for OCI NoSQL Database Migrator and Run the Migration Utility

Link : https://docs.oracle.com/en-us/iaas/nosql-database/doc/use-case-demonstrations.html#GUID-12D9BCE9-16F5-4852-999E-AD52320071C8


$ cd nosql-migrator-1.6.0/

$ vim config_PatientHealthRecords.json

Add the source and sink definition here

Note : Where you’re running the OCI NoSQL Database Migrator tool the awscli and oci-cli is configured with their respective credentials to access both DynamoDB and OCI NoSQL Cloud Database

{
"source" : {
"type" : "aws_s3",
"format" : "dynamodb_json",
"s3URL" : "https://patienthealthrecords-dynamodb-backup.s3.ap-southeast-2.amazonaws.com/AWSDynamoDB/01710053125774-99b0e7c9/data",
"credentials" : "/home/opc/.aws/credentials",
"credentialsProfile" : "default"
},
"sink" : {
"type" : "nosqldb_cloud",
"endpoint" : "ap-melbourne-1",
"table" : "PatientHealthRecords",
"compartment" : "Shadab",
"schemaInfo" : {
"defaultSchema" : true,
"readUnits" : 10,
"writeUnits" : 10,
"DDBPartitionKey" : "PatientID:String",
"DDBSortKey" : "RecordDate:Timestamp(5)",
"storageSize" : 1
},
"credentials" : "/home/opc/.oci/config",
"credentialsProfile" : "DEFAULT",
"writeUnitsPercent" : 90,
"requestTimeoutMs" : 5000
},
"abortOnError" : true,
"migratorVersion" : "1.0.0"
}

For more information about mapping DynamoDB table to OCI, see this page :

https://docs.oracle.com/en/cloud/paas/nosql-cloud/onscl/index.html#GUID-5845149C-8043-456B-986B-4668F29B9F0A

There are two different ways of modelling a DynamoDB table to OCI NoSQL :

[1] Modeling DynamoDB table as a JSON document(Recommended): In this modeling, you map all the attributes of the Dynamo DB tables into a JSON column of the NoSQL table except partition key and sort key.

[2] Modeling DynamoDB table as fixed columns in NoSQL table: In this modeling, for each attribute of the DynamoDB table, you will create a column in the NoSQL table. You will model partition key and sort key attributes as Primary key(s). This should be used only when you are certain that importing DynamoDB table schema is fixed and each item has values for the most of the attributes.

Run the database migrator

./runMigrator --config config_PatientHealthRecords.json

Important Note: The table DDL in this case is automatically created by the migrator utility.

5. Check the Table and Records on OCI NoSQL Database

All attributes except the partition and sort key of a Dynamo DB table item aggregated into a NoSQL JSON column

Incremental Restore

You might have a DynamoDb table which is live and transactions are being written to it while you’re exporting the table to S3. Then in addition to a full restore, you can do an incremental export and import

Initiate an incremental export of the DynamoDB table to the S3 bucket

Create a new folder called new_records in the same bucket or a new bucket

Check the folder structure after the incremental export completes and see the data/ folder where the gz files are there

You can also do a consistent full export and incremental export of DynamoDB to your S3 bucket using awscli

# Calculate Unix Epoch time in milliseconds
date +%s
1710374718

# Full export
aws dynamodb export-table-to-point-in-time \
--table-arn arn:aws:dynamodb:REGION:ACCOUNT:table/TABLENAME \
--s3-bucket bucketname \
--s3-prefix exports/ \
--s3-sse-algorithm AES256
--export-time 1710374718

# Incremental export, starting at the end time of the full export
aws dynamodb export-table-to-point-in-time \
--table-arn arn:aws:dynamodb:REGION:ACCOUNT:table/TABLENAME \
--s3-bucket bucketname \
--s3-prefix exports_incremental/ \
--incremental-export-specification ExportFromTime=1710374718,ExportToTime=1710374998,ExportViewType=NEW_IMAGE \
--export-type INCREMENTAL_EXPORT

Note :
- ExportFromTime here is the finish time of the Full export and ExportToTime is the current datetime calculated using date +%s command
- Difference between export period from time and export period cannot be less than 15 minutes

The thing with incremental restore is that the JSON file changes and the OCI NoSQL Database Migrator cannot read it directly, we need to transform the records into a format which the the OCI NoSQL Database Migrator tool can read

Download the files to your local machine and use the below Python3 script to transform the records

import json
import sys
import gzip

def transform_json(input_file_path, output_file_path):
try:
with open(input_file_path, 'r') as input_file, gzip.open(output_file_path, 'wt', encoding='utf-8') as output_file:
for line in input_file:

try:
original_json = json.loads(line)

transformed_record = {"Item": original_json["NewImage"]}
output_file.write(json.dumps(transformed_record) + "\n")
except json.JSONDecodeError as e:
print(f"Skipping line due to JSONDecodeError: {str(e)}")

print(f"Transformation complete. Output saved to: {output_file_path}")
except Exception as e:
print(f"An error occurred: {str(e)}")

if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: python3 script.py <input_file_path> <output_gzip_file_path>")
else:
input_file_path = sys.argv[1]

output_file_path = sys.argv[2]
transform_json(input_file_path, output_file_path)

Transform the original incremental export JSON file to DynamoDB JSON by running the Python3 script.

$ python3 dynamodb_incremental_to_s3_clean.py 3oga74lfdmyhzpomedgbb3jdoy.json clean.json.gz

Once the clean JSON files are generated, create a new configuration file for incremental load where the source is ‘file’ and sink is ‘nosqldb_cloud

{
"source" : {
"type" : "file",
"format" : "dynamodb_json",
"dataPath" : "/home/opc/nosql-migrator-1.6.0/clean.json.gz"
},
"sink" : {
"type" : "nosqldb_cloud",
"endpoint" : "ap-melbourne-1",
"table" : "PatientHealthRecords",
"compartment" : "Shadab",
"schemaInfo" : {
"defaultSchema" : true,
"readUnits" : 10,
"writeUnits" : 10,
"DDBPartitionKey" : "PatientID:String",
"DDBSortKey" : "RecordDate:Timestamp(5)",
"storageSize" : 1
},
"credentials" : "/home/opc/.oci/config",
"credentialsProfile" : "DEFAULT",
"writeUnitsPercent" : 90,
"requestTimeoutMs" : 5000
},
"abortOnError" : true,
"migratorVersion" : "1.0.0"
}

Now run the migrator tool again with the new config file to load to OCI NoSQL table

./runMigrator --config config_PatientHealthRecords_newrecords.json

We have finished doing a full restore and an incremental restore of the DynamoDB table to OCI NoSQL Cloud Database

--

--

Shadab Mohammad
Oracle Developers

Cloud Solutions Architect@Oracle (The statements and opinions expressed here are my own & do not necessarily represent those of my employer)