Migrate Your Amazon DynamoDB Table to Oracle NoSQL Database Cloud Service
Migrate your DynamoDB Tables to OCI NoSQl Cloud Database using the OCI NoSQL Database Migration Tool
Oracle NoSQL Cloud Service (NDCS) is a Key-Value Database like Amazon DynamoDB; it is a Schema-less, Schema-Flex & horizontally scalable. It requires no provisioning of servers or instance types, you simply create a table and you’re good to read-write to it with single-digit millisecond latency. Oracle NDCS supports both on-demand and provisioned capacity with storage based provisioning that supports JSON, Table and Key-Value datatypes.
To help our customers migrate from other NoSQL databases, Oracle offers a NoSQL Database Migrator Utility that can help you migrate your DynamoDB tables to OCI NDCS using a S3 bucket.
In this blog post we will go through the steps on how to migrate your DynamoDB table to OCI NoSQL Cloud Database using the migrator utility and a bit of Python3 code.
High-level Steps to Migrate DynamoDB Table to OCI NDCS
1.Download and Unzip NoSQL Migrator Tool
2.Create OCI and AWS credentials by installing awscli and ocicli
3.Create config.json with Source and Sink Information
4.Run the Migrator Utility
5.Check OCI NoSQL Table After Import
1 . AWS DynamoDB Table Setup
For this demonstration we will use a sample table called PatientHealthRecords in DynamoDB. PatientHealthRecords has a Partition key on the PatientID and Sort key as the RecordDate
In DynamoDB, a table is a collection of items, and each item is a collection of attributes. Each item in the table has a unique identifier, or a primary key. Other than the primary key, the table is schema-less. Each item can have its own distinct attributes.
Partition key — A simple primary key, composed of one attribute known as the partition key. DynamoDB uses the partition key’s value as input to an internal hash function. The output from the hash function determines the partition in which the item will be stored.Partition key and sort key — As a composite primary key, this type of key is composed of two attributes. The first attribute is the partition key, and the second attribute is the sort key. DynamoDB uses the partition key value as input to an internal hash function. The output from the hash function determines the partition in which the item will be stored. All items with the same partition key value are stored together, in sorted order by sort key value.
In contrast, Oracle NoSQL tables support flexible data models with both schema and schema-less design.
Let’s create the DynamoDB table with aws-cli
aws dynamodb create-table \
--table-name PatientHealthRecords \
--attribute-definitions \
AttributeName=PatientID,AttributeType=S \
AttributeName=RecordDate,AttributeType=S \
--key-schema \
AttributeName=PatientID,KeyType=HASH \
AttributeName=RecordDate,KeyType=RANGE \
--billing-mode PAY_PER_REQUEST
Let’s seed some records to the DynamoDB table using a Python3 script
import boto3
import faker
import sys
def generate_data(size):
fake = faker.Faker()
records = []
for _ in range(size):
record = {
'PatientID': fake.uuid4(),
'RecordDate': fake.date(),
'PatientName': fake.name(),
'Age': fake.random_int(min=0, max=100),
'Gender': fake.random_element(elements=('Male', 'Female', 'Other')),
'Diagnosis': fake.sentence(),
'Treatment': fake.sentence(),
'DoctorID': fake.uuid4()
}
records.append(record)
return records
def write_data_in_chunks(table_name, data, chunk_size):
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(table_name)
for i in range(0, len(data), chunk_size):
with table.batch_writer() as batch:
for record in data[i:i+chunk_size]:
batch.put_item(Item=record)
print(f"Successfully wrote {len(data)} records to {table_name} in chunks of {chunk_size}.")
if __name__ == "__main__":
table_name = 'PatientHealthRecords'
chunk_size = int(sys.argv[1]) if len(sys.argv) > 1 else 1000
data = generate_data(chunk_size)
write_data_in_chunks(table_name, data, chunk_size)
Run the script to seed the table
python3 load_dynamodb_table.py 1000
Check the records from OCI Console
2. Setup the OCI NoSQL Database Migrator Tool on Oracle Linux VM on OCI
a. Download the utility from here : https://www.oracle.com/database/technologies/nosql-database-server-downloads.html
b. Upload it to an Oracle Linux 7 or Oracle Linux 8 VM and unzip it
cd $HOME
unzip nosql-migrator-1.6.0.zip
c. Install awscli and create your AWS credentials after installing awscli
$ sudo yum install awscli
$ aws configure
d. Install ocicli and add your API key to the instance or create instance principal authentication
$ sudo yum install python36-oci-cli
$ oci setup config
Adding your API key to the Configuration File
[DEFAULT]
user=ocid1.user.oc1..<unique_ID>
fingerprint=<your_fingerprint>
key_file=~/.oci/oci_api_key.pem
tenancy=ocid1.tenancy.oc1..<unique_ID>
# Some comment
region=us-ashburn-1
3. Export DynamoDB Table to S3 Bucket
a. Create a S3 bucket patienthealthrecords-dynamodb-backup
b. Enable Point-In-Time-Recovery for your DynamoDB table
c. Export Table to S3
After the export is complected you will see below structure in your bucket
4. Create config File for OCI NoSQL Database Migrator and Run the Migration Utility
$ cd nosql-migrator-1.6.0/
$ vim config_PatientHealthRecords.json
Add the source and sink definition here
Note : Where you’re running the OCI NoSQL Database Migrator tool the awscli and oci-cli is configured with their respective credentials to access both DynamoDB and OCI NoSQL Cloud Database
{
"source" : {
"type" : "aws_s3",
"format" : "dynamodb_json",
"s3URL" : "https://patienthealthrecords-dynamodb-backup.s3.ap-southeast-2.amazonaws.com/AWSDynamoDB/01710053125774-99b0e7c9/data",
"credentials" : "/home/opc/.aws/credentials",
"credentialsProfile" : "default"
},
"sink" : {
"type" : "nosqldb_cloud",
"endpoint" : "ap-melbourne-1",
"table" : "PatientHealthRecords",
"compartment" : "Shadab",
"schemaInfo" : {
"defaultSchema" : true,
"readUnits" : 10,
"writeUnits" : 10,
"DDBPartitionKey" : "PatientID:String",
"DDBSortKey" : "RecordDate:Timestamp(5)",
"storageSize" : 1
},
"credentials" : "/home/opc/.oci/config",
"credentialsProfile" : "DEFAULT",
"writeUnitsPercent" : 90,
"requestTimeoutMs" : 5000
},
"abortOnError" : true,
"migratorVersion" : "1.0.0"
}
For more information about mapping DynamoDB table to OCI, see this page :
There are two different ways of modelling a DynamoDB table to OCI NoSQL :
[1] Modeling DynamoDB table as a JSON document(Recommended): In this modeling, you map all the attributes of the Dynamo DB tables into a JSON column of the NoSQL table except partition key and sort key.
[2] Modeling DynamoDB table as fixed columns in NoSQL table: In this modeling, for each attribute of the DynamoDB table, you will create a column in the NoSQL table. You will model partition key and sort key attributes as Primary key(s). This should be used only when you are certain that importing DynamoDB table schema is fixed and each item has values for the most of the attributes.
Run the database migrator
./runMigrator --config config_PatientHealthRecords.json
Important Note: The table DDL in this case is automatically created by the migrator utility.
5. Check the Table and Records on OCI NoSQL Database
All attributes except the partition and sort key of a Dynamo DB table item aggregated into a NoSQL JSON column
Incremental Restore
You might have a DynamoDb table which is live and transactions are being written to it while you’re exporting the table to S3. Then in addition to a full restore, you can do an incremental export and import
Initiate an incremental export of the DynamoDB table to the S3 bucket
Create a new folder called new_records in the same bucket or a new bucket
Check the folder structure after the incremental export completes and see the data/ folder where the gz files are there
You can also do a consistent full export and incremental export of DynamoDB to your S3 bucket using awscli
# Calculate Unix Epoch time in milliseconds
date +%s
1710374718
# Full export
aws dynamodb export-table-to-point-in-time \
--table-arn arn:aws:dynamodb:REGION:ACCOUNT:table/TABLENAME \
--s3-bucket bucketname \
--s3-prefix exports/ \
--s3-sse-algorithm AES256
--export-time 1710374718
# Incremental export, starting at the end time of the full export
aws dynamodb export-table-to-point-in-time \
--table-arn arn:aws:dynamodb:REGION:ACCOUNT:table/TABLENAME \
--s3-bucket bucketname \
--s3-prefix exports_incremental/ \
--incremental-export-specification ExportFromTime=1710374718,ExportToTime=1710374998,ExportViewType=NEW_IMAGE \
--export-type INCREMENTAL_EXPORT
Note :
- ExportFromTime here is the finish time of the Full export and ExportToTime is the current datetime calculated using date +%s command
- Difference between export period from time and export period cannot be less than 15 minutes
The thing with incremental restore is that the JSON file changes and the OCI NoSQL Database Migrator cannot read it directly, we need to transform the records into a format which the the OCI NoSQL Database Migrator tool can read
Download the files to your local machine and use the below Python3 script to transform the records
import json
import sys
import gzip
def transform_json(input_file_path, output_file_path):
try:
with open(input_file_path, 'r') as input_file, gzip.open(output_file_path, 'wt', encoding='utf-8') as output_file:
for line in input_file:
try:
original_json = json.loads(line)
transformed_record = {"Item": original_json["NewImage"]}
output_file.write(json.dumps(transformed_record) + "\n")
except json.JSONDecodeError as e:
print(f"Skipping line due to JSONDecodeError: {str(e)}")
print(f"Transformation complete. Output saved to: {output_file_path}")
except Exception as e:
print(f"An error occurred: {str(e)}")
if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: python3 script.py <input_file_path> <output_gzip_file_path>")
else:
input_file_path = sys.argv[1]
output_file_path = sys.argv[2]
transform_json(input_file_path, output_file_path)
Transform the original incremental export JSON file to DynamoDB JSON by running the Python3 script.
$ python3 dynamodb_incremental_to_s3_clean.py 3oga74lfdmyhzpomedgbb3jdoy.json clean.json.gz
Once the clean JSON files are generated, create a new configuration file for incremental load where the source is ‘file’ and sink is ‘nosqldb_cloud’
{
"source" : {
"type" : "file",
"format" : "dynamodb_json",
"dataPath" : "/home/opc/nosql-migrator-1.6.0/clean.json.gz"
},
"sink" : {
"type" : "nosqldb_cloud",
"endpoint" : "ap-melbourne-1",
"table" : "PatientHealthRecords",
"compartment" : "Shadab",
"schemaInfo" : {
"defaultSchema" : true,
"readUnits" : 10,
"writeUnits" : 10,
"DDBPartitionKey" : "PatientID:String",
"DDBSortKey" : "RecordDate:Timestamp(5)",
"storageSize" : 1
},
"credentials" : "/home/opc/.oci/config",
"credentialsProfile" : "DEFAULT",
"writeUnitsPercent" : 90,
"requestTimeoutMs" : 5000
},
"abortOnError" : true,
"migratorVersion" : "1.0.0"
}
Now run the migrator tool again with the new config file to load to OCI NoSQL table
./runMigrator --config config_PatientHealthRecords_newrecords.json
We have finished doing a full restore and an incremental restore of the DynamoDB table to OCI NoSQL Cloud Database